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Preface 


This text is intended as an introduction to elementary probability theory and 
stochastic processes. It is particularly well suited for those wanting to see how 
probability theory can be applied to the study of phenomena in fields such as engi- 
neering, computer science, management science, the physical and social sciences, 
and operations research. 

It is generally felt that there are two approaches to the study of probability the- 
ory. One approach is heuristic and nonrigorous and attempts to develop in the 
student an intuitive feel for the subject that enables him or her to “think proba- 
bilistically.” The other approach attempts a rigorous development of probability 
by using the tools of measure theory. It is the first approach that is employed 
in this text. However, because it is extremely important in both understanding 
and applying probability theory to be able to “think probabilistically,” this text 
should also be useful to students interested primarily in the second approach. 


New to This Edition 


The tenth edition includes new text material, examples, and exercises chosen not 
only for their inherent interest and applicability but also for their usefulness in 
strengthening the reader’s probabilistic knowledge and intuition. The new text 
material includes Section 2.7, which builds on the inclusion/exclusion identity to 
find the distribution of the number of events that occur; and Section 3.6.6 on left 
skip free random walks, which can be used to model the fortunes of an investor 
(or gambler) who always invests 1 and then receives a nonnegative integral return. 
Section 4.2 has additional material on Markov chains that shows how to modify a 
given chain when trying to determine such things as the probability that the chain 
ever enters a given class of states by some time, or the conditional distribution of 
the state at some time given that the class has never been entered. A new remark 
in Section 7.2 shows that results from the classical insurance ruin model also hold 
in other important ruin models. There is new material on exponential queueing 
models, including, in Section 2.2, a determination of the mean and variance of 
the number of lost customers in a busy period of a finite capacity queue, as well as 


xii Preface 


the new Section 8.3.3 on birth and death queueing models. Section 11.8.2 gives 
a new approach that can be used to simulate the exact stationary distribution of 
a Markov chain that satisfies a certain property. 

Among the newly added examples are 1.11, which is concerned with a multiple 
player gambling problem; 3.20, which finds the variance in the matching rounds 
problem; 3.30, which deals with the characteristics of a random selection from a 
population; and 4.25, which deals with the stationary distribution of a Markov 
chain. 


Course 


Ideally, this text would be used in a one-year course in probability models. Other 
possible courses would be a one-semester course in introductory probability 
theory (involving Chapters 1-3 and parts of others) or a course in elementary 
stochastic processes. The textbook is designed to be flexible enough to be used 
in a variety of possible courses. For example, I have used Chapters 5 and 8, with 
smatterings from Chapters 4 and 6, as the basis of an introductory course in 
queueing theory. 


Examples and Exercises 


Many examples are worked out throughout the text, and there are also a large 
number of exercises to be solved by students. More than 100 of these exercises 
have been starred and their solutions provided at the end of the text. These starred 
problems can be used for independent study and test preparation. An Instructor’s 
Manual, containing solutions to all exercises, is available free to instructors who 
adopt the book for class. 


Organization 


Chapters 1 and 2 deal with basic ideas of probability theory. In Chapter 1 an 
axiomatic framework is presented, while in Chapter 2 the important concept of 
a random variable is introduced. Subsection 2.6.1 gives a simple derivation of 
the joint distribution of the sample mean and sample variance of a normal data 
sample. 

Chapter 3 is concerned with the subject matter of conditional probability and 
conditional expectation. “Conditioning” is one of the key tools of probability 
theory, and it is stressed throughout the book. When properly used, conditioning 
often enables us to easily solve problems that at first glance seem quite diffi- 
cult. The final section of this chapter presents applications to (1) a computer list 
problem, (2) a random graph, and (3) the Polya urn model and its relation to 
the Bose-Einstein distribution. Subsection 3.6.5 presents k-record values and the 
surprising Ignatov’s theorem. 
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In Chapter 4 we come into contact with our first random, or stochastic, pro- 
cess, known as a Markov chain, which is widely applicable to the study of many 
real-world phenomena. Applications to genetics and production processes are 
presented. The concept of time reversibility is introduced and its usefulness illus- 
trated. Subsection 4.5.3 presents an analysis, based on random walk theory, of a 
probabilistic algorithm for the satisfiability problem. Section 4.6 deals with the 
mean times spent in transient states by a Markov chain. Section 4.9 introduces 
Markov chain Monte Carlo methods. In the final section we consider a model 
for optimally making decisions known as a Markovian decision process. 

In Chapter 5 we are concerned with a type of stochastic process known as a 
counting process. In particular, we study a kind of counting process known as 
a Poisson process. The intimate relationship between this process and the expo- 
nential distribution is discussed. New derivations for the Poisson and nonhomo- 
geneous Poisson processes are discussed. Examples relating to analyzing greedy 
algorithms, minimizing highway encounters, collecting coupons, and tracking 
the AIDS virus, as well as material on compound Poisson processes, are included 
in this chapter. Subsection 5.2.4 gives a simple derivation of the convolution of 
exponential random variables. 

Chapter 6 considers Markov chains in continuous time with an emphasis on 
birth and death models. Time reversibility is shown to be a useful concept, as it 
is in the study of discrete-time Markov chains. Section 6.7 presents the compu- 
tationally important technique of uniformization. 

Chapter 7, the renewal theory chapter, is concerned with a type of counting 
process more general than the Poisson. By making use of renewal reward pro- 
cesses, limiting results are obtained and applied to various fields. Section 7.9 
presents new results concerning the distribution of time until a certain pattern 
occurs when a sequence of independent and identically distributed random vari- 
ables is observed. In Subsection 7.9.1, we show how renewal theory can be used 
to derive both the mean and the variance of the length of time until a specified 
pattern appears, as well as the mean time until one of a finite number of specified 
patterns appears. In Subsection 7.9.2, we suppose that the random variables are 
equally likely to take on any of m possible values, and compute an expression 
for the mean time until a run of m distinct values occurs. In Subsection 7.9.3, we 
suppose the random variables are continuous and derive an expression for the 
mean time until a run of m consecutive increasing values occurs. 

Chapter 8 deals with queueing, or waiting line, theory. After some prelimi- 
naries dealing with basic cost identities and types of limiting probabilities, we 
consider exponential queueing models and show how such models can be ana- 
lyzed. Included in the models we study is the important class known as a network 
of queues. We then study models in which some of the distributions are allowed to 
be arbitrary. Included are Subsection 8.6.3 dealing with an optimization problem 
concerning a single server, general service time queue, and Section 8.8, concerned 
with a single server, general service time queue in which the arrival source is a 
finite number of potential users. 
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Chapter 9 is concerned with reliability theory. This chapter will probably be 
of greatest interest to the engineer and operations researcher. Subsection 9.6.1 
illustrates a method for determining an upper bound for the expected life of a 
parallel system of not necessarily independent components and Subsection 9.7.1 
analyzes a series structure reliability model in which components enter a state of 
suspended animation when one of their cohorts fails. 

Chapter 10 is concerned with Brownian motion and its applications. The theory 
of options pricing is discussed. Also, the arbitrage theorem is presented and its 
relationship to the duality theorem of linear programming is indicated. We show 
how the arbitrage theorem leads to the Black-Scholes option pricing formula. 

Chapter 11 deals with simulation, a powerful tool for analyzing stochastic 
models that are analytically intractable. Methods for generating the values of 
arbitrarily distributed random variables are discussed, as are variance reduction 
methods for increasing the efficiency of the simulation. Subsection 11.6.4 intro- 
duces the valuable simulation technique of importance sampling, and indicates 
the usefulness of tilted distributions when applying this method. 
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Introduction to 
Probability Theory 


CaS A 


1.1 Introduction 


Any realistic model of a real-world phenomenon must take into account the possi- 
bility of randomness. That is, more often than not, the quantities we are interested 
in will not be predictable in advance but, rather, will exhibit an inherent varia- 
tion that should be taken into account by the model. This is usually accomplished 
by allowing the model to be probabilistic in nature. Such a model is, naturally 
enough, referred to as a probability model. 

The majority of the chapters of this book will be concerned with different 
probability models of natural phenomena. Clearly, in order to master both the 
“model building” and the subsequent analysis of these models, we must have a 
certain knowledge of basic probability theory. The remainder of this chapter, as 
well as the next two chapters, will be concerned with a study of this subject. 


1.2 Sample Space and Events 


Suppose that we are about to perform an experiment whose outcome is not 
predictable in advance. However, while the outcome of the experiment will not 
be known in advance, let us suppose that the set of all possible outcomes is known. 
This set of all possible outcomes of an experiment is known as the sample space 
of the experiment and is denoted by S. 
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1. 


Some examples are the following. 


If the experiment consists of the flipping of a coin, then 
S={H,7T} 


where H means that the outcome of the toss is a head and T that it is a tail. 
If the experiment consists of rolling a die, then the sample space is 


S = {1, 2, 3,4, 5, 6} 


where the outcome i means that i appeared on the die, i = 1,2,3,4, 5,6. 
If the experiments consists of flipping two coins, then the sample space consists of the 
following four points: 


S = {(H, H), (A, T), (T, H), (T,T)} 


The outcome will be (H,H) if both coins come up heads; it will be (H,T) if the 
first coin comes up heads and the second comes up tails; it will be (T,H) if the 
first comes up tails and the second heads; and it will be (T,T) if both coins come 
up tails. 

If the experiment consists of rolling two dice, then the sample space consists of the 
following 36 points: 


(1,1), (1,2), (1,3), 4,4), 1,5), C1, 6) 
(2,1), (2,2), (2,3), (2,4), (2,5), (2, 6) 
(3,1), (3,2), 3,3), (3,4), (3,5), (3, 6) 
(4,1), (4,2), (4,3), (4,4), (4,5), (4, 6) 
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6) 
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6) 


where the outcome (i, /) is said to occur if i appears on the first die and j on the second 
die. 

If the experiment consists of measuring the lifetime of a car, then the sample space 
consists of all nonnegative real numbers. That is, 


S = [0, 00)* a 


Any subset E of the sample space S is known as an event. Some examples of 


events are the following. 


ili 


2 


In Example (1), if E = {H}, then E is the event that a head appears on the flip of the 
coin. Similarly, if E = {T}, then E would be the event that a tail appears. 

In Example (2), if E = {1}, then E is the event that one appears on the roll of the 
die. If E = {2,4,6}, then E would be the event that an even number appears on 
the roll. 


* The set (a,b) is defined to consist of all points x such that a < x < b. The set [a, 5] is defined 
to consist of all points x such that a < x < b. The sets (a, b] and [a, b) are defined, respectively, to 
consist of all points x such that a < x < band all points x such thata < x < b. 
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3’. In Example (3), if E = {(H,H), (H,T)}, then E is the event that a head appears on 
the first coin. 

4’. In Example (4), if E = {(1,6), (2,5), (3,4), (4,3), (5,2), (6, 1}, then E is the event 
that the sum of the dice equals seven. 

5’, In Example (5), if E = (2, 6), then E is the event that the car lasts between two and six 
years. |_| 


We say that the event E occurs when the outcome of the experiment lies in E. 
For any two events E and F of a sample space S we define the new event E U F 
to consist of all outcomes that are either in E or in F or in both E and F. That is, 
the event E U F will occur if either E or F occurs. For example, in (1) if E = {H} 
and F = {T}, then 


EUF={H, T} 


That is, EU F would be the whole sample space S. In (2) if E = {1,3, 5} and 
F = {1, 2, 3}, then 


EUF= (1,2, 3,5} 


and thus E U F would occur if the outcome of the die is 1 or 2 or 3 or 5. The 
event E U F is often referred to as the union of the event E and the event F. 

For any two events E and F, we may also define the new event EF, sometimes 
written EQ F, and referred to as the intersection of E and F, as follows. EF consists 
of all outcomes which are both in E and in F. That is, the event EF will occur 
only if both E and F occur. For example, in (2) if E = {1,3,5} and F = {1, 2, 3}, 
then 


EF = {1,3} 


and thus EF would occur if the outcome of the die is either 1 or 3. In Exam- 
ple (1) if E = {H} and F = {T}, then the event EF would not consist of any 
outcomes and hence could not occur. To give such an event a name, we shall 
refer to it as the null event and denote it by @. (That is, O refers to the event 
consisting of no outcomes.) If EF = @, then E and F are said to be mutually 
exclusive. 

We also define unions and intersections of more than two events in a simi- 
lar manner. If E,, Ey,... are events, then the union of these events, denoted by 
re, En, is defined to be the event that consists of all outcomes that are in E,, 
for at least one value of m = 1,2,.... Similarly, the intersection of the events Ey, 
denoted by (\?2., En, is defined to be the event consisting of those outcomes that 
are in all of the events E,,n = 1,2,.... 

Finally, for any event E we define the new event E‘, referred to as the 
complement of E, to consist of all outcomes in the sample space S that are not 
in E, That is, E° will occur if and only if E does not occur. In Example (4) 
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if E = {(1,6), (2, 5), 3, 4), (4, 3), (5,2), (6, 1)}, then E*° will occur if the sum of 
the dice does not equal seven. Also note that since the experiment must result in 
some outcome, it follows that S° = G. 


1.3 Probabilities Defined on Events 


Consider an experiment whose sample space is S. For each event E of the sample 
space S, we assume that a number P(E) is defined and satisfies the following three 
conditions: 


(i) O< P(E) <1. 
(ii) P(S)= 1. 
(iii) For any sequence of events E1, E2,... that are mutually exclusive, that is, events for 
which E,Em = © when n #4 m, then 


P ( U E = >" PEn) 
n=1 n=1 


We refer to P(E) as the probability of the event E. 


Example 1.1 In the coin tossing example, if we assume that a head is equally 
likely to appear as a tail, then we would have 


P({H}) = PUT) = 4 


On the other hand, if we had a biased coin and felt that a head was twice as likely 
to appear as a tail, then we would have 


PH) =43, PUT) =3 a 


Example 1.2 In the die tossing example, if we supposed that all six numbers 
were equally likely to appear, then we would have 


P({1}) = P({2}) = P({3}) = P4}) = P({S}) = P({6}) = § 


From (iii) it would follow that the probability of getting an even number would 
equal 


P({2, 4, 6}) = P({2}) + P({4}) + Po} 
al a 
= 


Remark We have chosen to give a rather formal definition of probabilities as 
being functions defined on the events of a sample space. However, it turns out 
that these probabilities have a nice intuitive property. Namely, if our experiment 
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is repeated over and over again then (with probability 1) the proportion of time 
that event E occurs will just be P(E). 


Since the events E and E° are always mutually exclusive and since EU E° = S 
we have by (ii) and (iii) that 


1 = P(S) = P(EUE®) = P(E) + P(E‘) 
or 
P(E*°) = 1— P(E) (11) 


In words, Equation (1.1) states that the probability that an event does not occur 
is one minus the probability that it does occur. 

We shall now derive a formula for P(E U F), the probability of all outcomes 
either in E or in F. To do so, consider P(E) + P(F), which is the probability of all 
outcomes in E plus the probability of all points in F. Since any outcome that is 
in both E and F will be counted twice in P(E) + P(F) and only once in P(EU F), 
we must have 


P(E) + P(F) = P(EUF) + P(EF) 
or equivalently 
P(E UF) = P(E) + P(F) — P(EF) (1.2) 


Note that when E and F are mutually exclusive (that is, when EF = @), then 
Equation (1.2) states that 


P(E UP) = P(E) + P(F) — P®) 
= P(E) + P(F) 


a result which also follows from condition (iii). (Why is P(Q) = 0?) 


Example 1.3 Suppose that we toss two coins, and suppose that we assume that 
each of the four outcomes in the sample space 


S = {(H, H), (H, T), (T, A), (T, T)} 
is equally likely and hence has probability 7 Let 
E = {(H,H),(H,T)} and F = {(H,H),(T,H)} 


That is, E is the event that the first coin falls heads, and F is the event that the 
second coin falls heads. 
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By Equation (1.2) we have that P(EU F), the probability that either the first or 
the second coin falls heads, is given by 


P(E UF) = P(E) + P(F) — P(EF) 


= }4+4-PdH,H)) 
= ee ee 
_ 4 — 4 


This probability could, of course, have been computed directly since 
P(EUF) = P({H, H), (HT), (T,)}) = 3 = 


We may also calculate the probability that any one of the three events E or F 
or G occurs. This is done as follows: 


P(EU FUG) = P(EUP)UG) 
which by Equation (1.2) equals 
P(E UF) + P(G) — P(EU F)G) 


Now we leave it for you to show that the events (E U F)G and EG U FG are 
equivalent, and hence the preceding equals 


P(EUFUG) 
— P(E) + P(F) — P(EF) + P(G) — P(EG U FG) 
— P(E) + P(F) — P(EF) + P(G) — P(EG) — P(FG) + P(EGFG) 
— P(E) + P(F) + P(G) — P(EF) — P(EG) — P(FG) + P(EFG) (1.3) 


In fact, it can be shown by induction that, for any 7 events Ey, Ey, E3,..., En, 


P(Ey UE, U-+-UEn) =) P(Ej) — ) PEE;) + D> P(EVEJEx) 


i<j i<j<k 
— >> PE E;ERE) 
i<j<k<l 
tees (HD) PELE? - «+ En) (1.4) 


In words, Equation (1.4), known as the inclusion-exclusion identity, states that 
the probability of the union of 7 events equals the sum of the probabilities of 
these events taken one at a time minus the sum of the probabilities of these events 
taken two at a time plus the sum of the probabilities of these events taken three 
at a time, and so on. 
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1.4 Conditional Probabilities 


Suppose that we toss two dice and that each of the 36 possible outcomes is equally 
likely to occur and hence has probability 36: Suppose that we observe that the 
first die is a four. Then, given this information, what is the probability that the 
sum of the two dice equals six? To calculate this probability we reason as follows: 
Given that the initial die is a four, it follows that there can be at most six possible 
outcomes of our experiment, namely, (4, 1), (4,2), (4, 3), (4,4), (4, 5), and (4, 6). 
Since each of these outcomes originally had the same probability of occurring, 
they should still have equal probabilities. That is, given that the first die is a four, 
then the (conditional) probability of each of the outcomes (4,1), (4,2), (4,3), 
(4,4), (4,5), (4, 6) is 4 while the (conditional) probability of the other 30 points 
in the sample space is 0. Hence, the desired probability will be q, 

If we let E and F denote, respectively, the event that the sum of the dice is 
six and the event that the first die is a four, then the probability just obtained 
is called the conditional probability that E occurs given that F has occurred and 
is denoted by 


P(EIF) 


A general formula for P(E|F) that is valid for all events E and F is derived in the 
same manner as the preceding. Namely, if the event F occurs, then in order for 
E to occur it is necessary for the actual occurrence to be a point in both E and 
in F, that is, it must be in EF. Now, because we know that F has occurred, it 
follows that F becomes our new sample space and hence the probability that the 
event EF occurs will equal the probability of EF relative to the probability of F. 
That is, 


P(EB) 


P(E|F) = a 


(1.5) 


Note that Equation (1.5) is only well defined when P(F) > 0 and hence P(E|F) 
is only defined when P(F) > 0. 


Example 1.4 Suppose cards numbered one through ten are placed in a hat, 
mixed up, and then one of the cards is drawn. If we are told that the number 
on the drawn card is at least five, then what is the conditional probability that 
it is ten? 


Solution: Let E denote the event that the number of the drawn card is ten, 
and let F be the event that it is at least five. The desired probability is P(E|F). 
Now, from Equation (1.5) 


P(EF) 


P(E|F) = Tay 


8 Introduction to Probability Theory 


However, EF = E since the number of the card will be both ten and at least 
five if and only if it is number ten. Hence, 


1 
P(E|F) = 2 == a 
10 


Example 1.5 A family has two children. What is the conditional probability that 
both are boys given that at least one of them is a boy? Assume that the sample 
space S is given by S = {(b, b), (0, g), (g, 5), (g, Z)}, and all outcomes are equally 
likely. ((b, g) means, for instance, that the older child is a boy and the younger 
child a girl.) 


Solution: Letting B denote the event that both children are boys, and A the 
event that at least one of them is a boy, then the desired probability is given by 


P(BA 
E P({(b, b)}) oe - 
P({G,b),0,9),(¢5)) = 3 


Example 1.6 Bev can either take a course in computers or in chemistry. If Bev 
takes the computer course, then she will receive an A grade with probability 4 23 ; if 
she takes the chemistry course then she will receive an A grade with probability 4 3 
Bev decides to base her decision on the flip of a fair coin. What is the probability 
that Bev will get an A in chemistry? 


Solution: If we let C be the event that Bev takes chemistry and A denote the 
event that she receives an A in whatever course she takes, then the desired 
probability is P(AC). This is calculated by using Equation (1.5) as follows: 


P(AC) = P(C)P(A|C) 
11_1 


=33=% a 
Example 1.7. Suppose an urn contains seven black balls and five white balls. We 
draw two balls from the urn without replacement. Assuming that each ball in the 
urn is equally likely to be drawn, what is the probability that both drawn balls 
are black? 


Solution: Let F and E denote, respectively, the events that the first and second 
balls drawn are black. Now, given that the first ball selected is black, there are 
six remaining black balls and five white balls, and so P(E|F) = i. As P(F) is 


clearly 74, our desired probability is 


P(EF) = P(F)P(E|F) 
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Example 1.8 Suppose that each of three men at a party throws his hat into the 
center of the room. The hats are first mixed up and then each man randomly selects 
a hat. What is the probability that none of the three men selects his own hat? 


Solution: We shall solve this by first calculating the complementary probabil- 
ity that at least one man selects his own hat. Let us denote by E;, i = 1,2, 3, 
the event that the ith man selects his own hat. To calculate the probability 
P(E, U Ez UE3), we first note that 


P(E;) = 3; #=1,2,3 
PEE) =§ i#i (1.6) 
P(E, E2E3) = % 


To see why Equation (1.6) is correct, consider first 
P(E;E;) = P(E;) PEE) 


Now P(E;), the probability that the ith man selects his own hat, is clearly 4 
since he is equally likely to select any of the three hats. On the other hand, 
given that the ith man has selected his own hat, then there remain two hats 
that the jth man may select, and as one of these two is his own hat, it follows 
that with probability 5 he will select it. That is, P(E;|E;) = } and so 

P(E;Ej) = P(E) P(Ej|E;) = 34 = % 
To calculate P(E; Ex E3) we write 

P(E, E2F3) = P(E} £2) P(E3|F1 £2) 

= £P(E3|E1E2) 

However, given that the first two men get their own hats it follows that the 


third man must also get his own hat (since there are no other hats left). That 
is, P(E3|E,E2) = 1 and so 


P(E\E2E3) = § 
Now, from Equation (1.4) we have that 


P(Ey U E2 U E3) = P(E) + P(E2) + P(E3) — PCE1 Ez) 
— P(E, E3) — P(E E3) + P(E, E2E3) 
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Hence, the probability that none of the men selects his own hat is 
1 


2 
1-3 =}. 7 


1.5 Independent Events 

Two events E and F are said to be independent if 
P(EF) = P(E)P(F) 

By Equation (1.5) this implies that E and F are independent if 
P(E|F) = P(E) 


(which also implies that P(F|E) = P(F)). That is, E and F are independent if 
knowledge that F has occurred does not affect the probability that E occurs. 
That is, the occurrence of E is independent of whether or not F occurs. 

Two events E and F that are not independent are said to be dependent. 


Example 1.9 Suppose we toss two fair dice. Let E, denote the event that the 
sum of the dice is six and F denote the event that the first die equals four. Then 


P(E,F) = P({4, 2}) = “ 
while 
P(E1)P(F) = 3% = 


and hence F, and F are not independent. Intuitively, the reason for this is clear 
for if we are interested in the possibility of throwing a six (with two dice), then we 
will be quite happy if the first die lands four (or any of the numbers 1, 2, 3, 4, 5) 
because then we still have a possibility of getting a total of six. On the other hand, 
if the first die landed six, then we would be unhappy as we would no longer have 
a chance of getting a total of six. In other words, our chance of getting a total 
of six depends on the outcome of the first die and hence E; and F cannot be 
independent. 

Let Ey be the event that the sum of the dice equals seven. Is Ey independent 
of F? The answer is yes since 


P(EpF) = P({(4,3)}) = & 
while 
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We leave it for you to present the intuitive argument why the event that the sum of 
the dice equals seven is independent of the outcome on the first die. a 


The definition of independence can be extended to more than two events. 
The events F1,Fo,...,E, are said to be independent if for every subset 
Fy, Ey,...,Ey, r <n, of these events 


P(Ey Ep +++ Ey) = P(Ey/)P(Ey) --- PCEy) 


Intuitively, the events F1,Eo,...,E, are independent if knowledge of the 
occurrence of any of these events has no effect on the probability of any other 
event. 


Example 1.10 (Pairwise Independent Events That Are Not Independent) Let a 
ball be drawn from an urn containing four balls, numbered 1, 2, 3, 4. Let E = 
{1,2}, F = {1,3}, G = {1,4}. If all four outcomes are assumed equally likely, 
then 


P(EF) = P(E)P(F) = i, 
P(EG) = P(E)P(G) = 5, 
P(FG) = P(F)P(G) = 4 


However, 
4 = P(EFG) # P(E)P(F)P(G) 


Hence, even though the events E, F,G are pairwise independent, they are not 
jointly independent. a 


Example 1.11 There are r players, with player i initially having 7; units, 
nj > 0,i=1,...,7. At each stage, two of the players are chosen to play a game, 
with the winner of the game receiving 1 unit from the loser. Any player whose 
fortune drops to 0 is eliminated, and this continues until a single player has 
all nm = )~;_, nj units, with that player designated as the victor. Assuming that 
the results of successive games are independent, and that each game is equally 
likely to be won by either of its two players, find the probability that player i is 
the victor. 


Solution: To begin, suppose that there are 7 players, with each player initially 
having 1 unit. Consider player i. Each stage she plays will be equally likely to 
result in her either winning or losing 1 unit, with the results from each stage 
being independent. In addition, she will continue to play stages until her fortune 
becomes either 0 or 7. Because this is the same for all players, it follows that 
each player has the same chance of being the victor. Consequently, each player 
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has player probability 1/1 of being the victor. Now, suppose these 7 players 
are divided into r teams, with team i containing 7; players, i = 1,...,7. That 
is, suppose players 1,...,, constitute team 1, players 2; + 1,...,2, + m2 
constitute team 2 and so on. Then the probability that the victor is a member 
of team 7 is n;/n. But because team # initially has a total fortune of n; units, 
i= 1,...,r, and each game played by members of different teams results in 
the fortune of the winner’s team increasing by 1 and that of the loser’s team 
decreasing by 1, it is easy to see that the probability that the victor is from 
team i is exactly the desired probability. Moreover, our argument also shows 
that the result is true no matter how the choices of the players in each stage 
are made. a 


Suppose that a sequence of experiments, each of which results in either a 
“success” or a “failure,” is to be performed. Let E;,i > 1, denote the event 
that the ith experiment results in a success. If, for all #1, 72,..., in, 


n 
PELE B= PE) 
jal 


we say that the sequence of experiments consists of independent trials. 


1.6 Bayes’ Formula 
Let E and F be events. We may express E as 
E = EFU EF* 


because in order for a point to be in E, it must either be in both E and F, or it 
must be in E and not in F. Since EF and EF¢ are mutually exclusive, we have that 


P(E) = P(EF) + P(EF*) 
= P(E|F)P(F) + P(E|F®)P(F) 
= P(E|F)P(F) + P(E|F°)(1 — P(P)) (1.7) 


Equation (1.7) states that the probability of the event E is a weighted average 
of the conditional probability of E given that F has occurred and the condi- 
tional probability of E given that F has not occurred, each conditional proba- 
bility being given as much weight as the event on which it is conditioned has of 
occurring. 


Example 1.12 Consider two urns. The first contains two white and seven black 
balls, and the second contains five white and six black balls. We flip a fair coin and 
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then draw a ball from the first urn or the second urn depending on whether the 
outcome was heads or tails. What is the conditional probability that the outcome 
of the toss was heads given that a white ball was selected? 


Solution: Let W be the event that a white ball is drawn, and let H be the 
event that the coin comes up heads. The desired probability P(H|W) may be 
calculated as follows: 


P(HW) _ P(W|H)P(H) 


Ha PW) ~~——S»?P(W) 
- P(W|H)P(H) 
~ P(W|H)P(A) + P(W|H*)P(H*) 
21 22 
92 


Example 1.13 In answering a question on a multiple-choice test a student 
either knows the answer or guesses. Let p be the probability that she knows 
the answer and 1 — p the probability that she guesses. Assume that a student 
who guesses at the answer will be correct with probability 1/m, where m is 
the number of multiple-choice alternatives. What is the conditional probabil- 
ity that a student knew the answer to a question given that she answered it 
correctly? 


Solution: Let C and K denote respectively the event that the student answers 
the question correctly and the event that she actually knows the answer. 
Now 


P(KC) _ P(C|K)P(K) 


P(K|C) = P(C) ~~ P(C|K)P(K) + P(C|K®)P(K®) 
Pp 


~ p+ (/md —p) 
= mp 
~ 14+ (m—1)p 


Thus, for example, ifm = 5,p = 4 then the probability that a student knew the 
answer to a question she correctly answered is :. a 


Example 1.14 A laboratory blood test is 95 percent effective in detecting a cer- 
tain disease when it is, in fact, present. However, the test also yields a “false 
positive” result for 1 percent of the healthy persons tested. (That is, if a healthy 
person is tested, then, with probability 0.01, the test result will imply he has the 
disease.) If 0.5 percent of the population actually has the disease, what is the 
probability a person has the disease given that his test result is positive? 
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Solution: Let D be the event that the tested person has the disease, and E 
the event that his test result is positive. The desired probability P(D|E) is 
obtained by 


P(DE) _ P(E|D)P(D) 

P(E)  P(E|D)P(D) + P(E|D°)P(D*) 
_ (0.95)(0.005) 

~ (0.95)(0.005) + (0.01)(0.995) 


95 
= —— ¥ 0.32 
294 a 


P(DIE) = 


Thus, only 32 percent of those persons whose test results are positive actually 
have the disease. a 


Equation (1.7) may be generalized in the following manner. Suppose that 
Fi, Fo,..., F, are mutually exclusive events such that )i_, F; = S. In other words, 
exactly one of the events F,, Fo,..., F,, will occur. By writing 


and using the fact that the events EF;, i = 1,...,”, are mutually exclusive, we 
obtain that 


P(E) = ae P(EF,) 
i=1 


= >> P(E|E) PF) (1.8) 


i=1 


Thus, Equation (1.8) shows how, for given events F,, F2,..., F, of which one 
and only one must occur, we can compute P(E) by first “conditioning” upon 
which one of the F; occurs. That is, it states that P(E) is equal to a weighted 
average of P(E|F;), each term being weighted by the probability of the event on 
which it is conditioned. 

Suppose now that E has occurred and we are interested in determining which 
one of the F; also occurred. By Equation (1.8) we have that 


P(EF)) 
P(E) 
P(E|F))P(F)) 


~ 2, PEF) PR) (1.9) 


P(F|E) = 


Equation (1.9) is known as Bayes’ formula. 


Exercises IS 


Example 1.15 You know that a certain letter is equally likely to be in any one 
of three different folders. Let a; be the probability that you will find your letter 
upon making a quick examination of folder i if the letter is, in fact, in folder 
i, i = 1,2,3. (We may have a; < 1.) Suppose you look in folder 1 and do not 
find the letter. What is the probability that the letter is in folder 1? 


Solution: Let F;, i = 1,2,3 be the event that the letter is in folder 7; and let 
E be the event that a search of folder 1 does not come up with the letter. We 
desire P(F,|E). From Bayes’ formula we obtain 


P(E|Fy)P (Fy) 


P(F,|E) = 
Dy P(EVF) PF) 
d- 01) 5 1l-a, 
a 1 eee a 
(l= ai)sPo ts 3-a1 
Exercises 


1. A box contains three marbles: one red, one green, and one blue. Consider an exper- 
iment that consists of taking one marble from the box then replacing it in the box 
and drawing a second marble from the box. What is the sample space? If, at all 
times, each marble in the box is equally likely to be selected, what is the probability 
of each point in the sample space? 


*2. Repeat Exercise 1 when the second marble is drawn without replacing the first 
marble. 


3. Accoin is to be tossed until a head appears twice in a row. What is the sample space 
for this experiment? If the coin is fair, what is the probability that it will be tossed 
exactly four times? 


4. Let E,F,G be three events. Find expressions for the events that of E, F,G 
(a) only F occurs, 

(b) both E and F but not G occur, 

(c) at least one event occurs, 

(d) at least two events occur, 

(e) all three events occur, 

(f) none occurs, 

(g) at most one occurs, 

(h) at most two occur. 


*5,. An individual uses the following gambling system at Las Vegas. He bets $1 that 
the roulette wheel will come up red. If he wins, he quits. If he loses then he makes 
the same bet a second time only this time he bets $2; and then regardless of the 
outcome, quits. Assuming that he has a probability of 5 of winning each bet, what 
is the probability that he goes home a winner? Why is this system not used by 
everyone? 


6. Show that E(F U G) = EFU EG. 
7. Show that (EU F)* = E°F*. 
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10. 


11. 
12, 


13. 


14. 


NS: 
16. 
7% 


18. 


If P(E) = 0.9 and P(F) = 0.8, show that P(EF) > 0.7. In general, show that 
P(EF) > P(E) + P(F)-1 


This is known as Bonferroni’s inequality. 
We say that E Cc F if every point in E is also in F. Show that if E c F, then 


P(F) = P(E) + P(FE*) > P(E) 


Show that 
y (U #) < DUPE 
i=] i=1 


This is known as Boole’s inequality. 

Hint: Either use Equation (1.2) and mathematical induction, or else show that 
Ui, Ei = Ui, Fi, where Fy = Ey, Fj = E; Ae; ES, and use property (iii) of a 
probability. 

If two fair dice are tossed, what is the probability that the sum is i, i = 2,3,...,12? 


Let E and F be mutually exclusive events in the sample space of an experiment. 
Suppose that the experiment is repeated until either event E or event F occurs. 
What does the sample space of this new super experiment look like? Show that the 
probability that event E occurs before event F is P(E)/ [P(E) + P(F)]. 


Hint: Argue that the probability that the original experiment is performed 7 times 
and E appears on the mth time is P(E) x (1—p)""!,n = 1,2,..., where p = P(E) + 
P(F). Add these probabilities to get the desired answer. 

The dice game craps is played as follows. The player throws two dice, and if the sum 
is seven or eleven, then she wins. If the sum is two, three, or twelve, then she loses. 
If the sum is anything else, then she continues throwing until she either throws that 
number again (in which case she wins) or she throws a seven (in which case she 
loses). Calculate the probability that the player wins. 


The probability of winning on a single toss of the dice is p. A starts, and if he 
fails, he passes the dice to B, who then attempts to win on her toss. They continue 
tossing the dice back and forth until one of them wins. What are their respective 
probabilities of winning? 

Argue that E= EFU EF®, EU F= EU FE*. 

Use Exercise 15 to show that P(E U F) = P(E) + P(F) — P(EF). 


Suppose each of three persons tosses a coin. If the outcome of one of the tosses 
differs from the other outcomes, then the game ends. If not, then the persons start 
over and retoss their coins. Assuming fair coins, what is the probability that the 
game will end with the first round of tosses? If all three coins are biased and have 
probability 4 of landing heads, what is the probability that the game will end at 
the first round? 

Assume that each child who is born is equally likely to be a boy or a girl. If a family 
has two children, what is the probability that both are girls given that (a) the eldest 
is a girl, (b) at least one is a girl? 
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*19, 


20. 


21. 


22. 


23. 


24. 


#25, 


26. 


*27, 


Two dice are rolled. What is the probability that at least one is a six? If the two 
faces are different, what is the probability that at least one is a six? 


Three dice are thrown. What is the probability the same number appears on exactly 
two of the three dice? 


Suppose that 5 percent of men and 0.25 percent of women are color-blind. A color- 
blind person is chosen at random. What is the probability of this person being male? 
Assume that there are an equal number of males and females. 


A and B play until one has 2 more points than the other. Assuming that each point 
is independently won by A with probability p, what is the probability they will play 
a total of 2” points? What is the probability that A will win? 


For events Fj, F2,..., E, show that 


P(E} Ed --- Ey) = P(E,)P(E2|E1)P(E3|E1 E) --- P(En|Eq --- En_1) 


In an election, candidate A receives 1 votes and candidate B receives m votes, where 
n > m. Assume that in the count of the votes all possible orderings of the m + m 
votes are equally likely. Let P;,. denote the probability that from the first vote on 
A is always in the lead. Find 


(a) Poy (b) P34 (c) Prt (d) P32 (e) P42 
(f) Py2 (g) P43 (h) Ps3 (i) Ps4 
(j) Make a conjecture as to the value of Py. 


Two cards are randomly selected from a deck of 52 playing cards. 

(a) What is the probability they constitute a pair (that is, that they are of the same 

denomination)? 

(b) What is the conditional probability they constitute a pair given that they are 
of different suits? 


A deck of 52 playing cards, containing all 4 aces, is randomly divided into 4 piles 
of 13 cards each. Define events E;, Ex, E3, and E4 as follows: 


F, = {the first pile has exactly 1 ace}, 
E = {the second pile has exactly 1 ace}, 
E3 = {the third pile has exactly 1 ace}, 
E4 = {the fourth pile has exactly 1 ace} 


Use Exercise 23 to find P(E, E2E3E4), the probability that each pile has an ace. 
Suppose in Exercise 26 we had defined the events E;, i = 1,2, 3,4, by 


FE, = {one of the piles contains the ace of spades}, 
Ez = {the ace of spades and the ace of hearts are in different piles}, 


E3 = {the ace of spades, the ace of hearts, and the 
ace of diamonds are in different piles}, 


E4 = {all 4 aces are in different piles} 


Now use Exercise 23 to find P(E, E7E3E4), the probability that each pile has an 
ace. Compare your answer with the one you obtained in Exercise 26. 
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28. 


29. 


*30. 


31. 


*32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


If the occurrence of B makes A more likely, does the occurrence of A make B more 

likely? 

Suppose that P(E) = 0.6. What can you say about P(E|F) when 

(a) E and F are mutually exclusive? 

(b) ECF? 

(c) FCE? 

Bill and George go target shooting together. Both shoot at a target at the same time. 

Suppose Bill hits the target with probability 0.7, whereas George, independently, 

hits the target with probability 0.4. 

(a) Given that exactly one shot hit the target, what is the probability that it was 
George’s shot? 

(b) Given that the target is hit, what is the probability that George hit it? 

What is the conditional probability that the first die is six given that the sum of the 

dice is seven? 

Suppose all 7 men at a party throw their hats in the center of the room. Each man 


then randomly selects a hat. Show that the probability that none of the 7 men selects 
his own hat is 


Note that as  — oo this converges to e~!. Is this surprising? 


In a class there are four freshman boys, six freshman girls, and six sophomore boys. 
How many sophomore girls must be present if sex and class are to be independent 
when a student is selected at random? 


Mr. Jones has devised a gambling system for winning at roulette. When he bets, he 
bets on red, and places a bet only when the ten previous spins of the roulette have 
landed on a black number. He reasons that his chance of winning is quite large 
since the probability of eleven consecutive spins resulting in black is quite small. 
What do you think of this system? 


A fair coin is continually flipped. What is the probability that the first four flips are 

(a) H,H,H, H? 

(b) T, H, H, H? 

(c) What is the probability that the pattern T, H, H, H occurs before the pattern 
H, H, H, H? 

Consider two boxes, one containing one black and one white marble, the other, 

two black and one white marble. A box is selected at random and a marble is 

drawn at random from the selected box. What is the probability that the marble is 

black? 


In Exercise 36, what is the probability that the first box was the one selected given 
that the marble is white? 


Urn 1 contains two white balls and one black ball, while urn 2 contains one white 
ball and five black balls. One ball is drawn at random from urn 1 and placed in urn 
2. A ball is then drawn from urn 2. It happens to be white. What is the probability 
that the transferred ball was white? 


Stores A, B, and C have 50, 75, and 100 employees, and, respectively, 50, 60, and 
70 percent of these are women. Resignations are equally likely among all employees, 
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41. 


42. 


43. 


44, 


#45, 


46. 


47. 


regardless of sex. One employee resigns and this isa woman. What is the probability 
that she works in store C? 


(a) A gambler has in his pocket a fair coin and a two-headed coin. He selects 
one of the coins at random, and when he flips it, it shows heads. What is the 
probability that it is the fair coin? 

(b) Suppose that he flips the same coin a second time and again it shows heads. 
Now what is the probability that it is the fair coin? 

(c) Suppose that he flips the same coin a third time and it shows tails. Now what 
is the probability that it is the fair coin? 


In a certain species of rats, black dominates over brown. Suppose that a black rat 

with two black parents has a brown sibling. 

(a) What is the probability that this rat is a pure black rat (as opposed to being a 
hybrid with one black and one brown gene)? 

(b) Suppose that when the black rat is mated with a brown rat, all five of their 
offspring are black. Now, what is the probability that the rat is a pure black 
rat? 


There are three coins in a box. One is a two-headed coin, another is a fair coin, 
and the third is a biased coin that comes up heads 75 percent of the time. When 
one of the three coins is selected at random and flipped, it shows heads. What is 
the probability that it was the two-headed coin? 


Suppose we have ten coins which are such that if the ith one is flipped then heads will 
appear with probability 1/10, i = 1,2,...,10. When one of the coins is randomly 
selected and flipped, it shows heads. What is the conditional probability that it was 
the fifth coin? 


Urn 1 has five white and seven black balls. Urn 2 has three white and twelve black 
balls. We flip a fair coin. If the outcome is heads, then a ball from urn 1 is selected, 
while if the outcome is tails, then a ball from urn 2 is selected. Suppose that a white 
ball is selected. What is the probability that the coin landed tails? 


An urn contains b black balls and 7 red balls. One of the balls is drawn at random, 
but when it is put back in the urn c additional balls of the same color are put in with 
it. Now suppose that we draw another ball. Show that the probability that the first 
ball drawn was black given that the second ball drawn was red is b/(b + r+ 0). 


Three prisoners are informed by their jailer that one of them has been chosen at 
random to be executed, and the other two are to be freed. Prisoner A asks the jailer 
to tell him privately which of his fellow prisoners will be set free, claiming that 
there would be no harm in divulging this information, since he already knows that 
at least one will go free. The jailer refuses to answer this question, pointing out 
that if A knew which of his fellows were to be set free, then his own probability of 
being executed would rise from 3 to 5 since he would then be one of two prisoners. 
What do you think of the jailer’s reasoning? 


For a fixed event B, show that the collection P(A|B), defined for all events A, satisfies 
the three conditions for a probability. Conclude from this that 


P(A|B) = P(A|BC)P(C|B) + P(A|BC*)P(C*|B) 


Then directly verify the preceding equation. 
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*48. Sixty percent of the families in a certain community own their own car, thirty 
percent own their own home, and twenty percent own both their own car and their 
own home. If a family is randomly chosen, what is the probability that this family 
owns a car or a house but not both? 
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2.1 Random Variables 


It frequently occurs that in performing an experiment we are mainly interested in 
some functions of the outcome as opposed to the outcome itself. For instance, in 
tossing dice we are often interested in the sum of the two dice and are not really 
concerned about the actual outcome. That is, we may be interested in knowing 
that the sum is seven and not be concerned over whether the actual outcome was 
(1, 6) or (2, 5) or (3, 4) or (4, 3) or (5, 2) or (6, 1). These quantities of interest, 
or more formally, these real-valued functions defined on the sample space, are 
known as random variables. 

Since the value of a random variable is determined by the outcome of the 
experiment, we may assign probabilities to the possible values of the random 
variable. 


Example 2.1 Letting X denote the random variable that is defined as the sum of 
two fair dice; then 
P(X = 2} = P(A, 1)} = 36, 
P{X = 3} = P{(1,2), (2, D} = %, 
P{X = 4} = P(, 3), 2,2),3, D}= 4, 
Pix=5)=P(0,4),.2,3),6;2),4,D)= 3.5 
P{X = 6} = P{(1, 5), (2,4), (3, 3), (4,2), (5, 1)} = 
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P{X = 7} = P{(1, 6), (2, 5), (3, 4), (4, 3), (5,2), (6, D} = % 
P{X = 8} = P{(2, 6), (3, 5), (4, 4), (S, 3), (6, 2)} = > 
P{X = 9} = P(G, 6), (4, 5), (5,4), (6,3)} = 35 
P{X = 10} = P{(4, 6), (5,5), (6,4)} = #, 
P{X = 11} = P{(S5, 6), (6,5)} = #, 
P{X = 12} = P{6,6)} = 4 (2.1) 
In other words, the random variable X can take on any integral value between 
two and twelve, and the probability that it takes on each value is given by Equa- 


tion (2.1). Since X must take on one of the values two through twelve, we must 
have 


12 12 
1=P{Uor=n = Pix =a} 
i=2 n=2 


which may be checked from Equation (2.1). a 
Example 2.2 For a second example, suppose that our experiment consists of 
tossing two fair coins. Letting Y denote the number of heads appearing, then Y is 
a random variable taking on one of the values 0, 1, 2 with respective probabilities 

PLY = 0} = P{(T,T)} =f, 

P{Y = 1} = P((T,H), (A, T)} = 3, 

P{Y =2} = P{(H,H)} = 


Of course, P{Y = 0} + P{Y = 1} 4+ P{Y =2}=1. |_| 


Example 2.3 Suppose that we toss a coin having a probability p of coming 
up heads, until the first head appears. Letting N denote the number of flips 
required, then assuming that the outcome of successive flips are independent, 
N is a random variable taking on one of the values 1,2,3,..., with respective 
probabilities 


P{N = 1} = P{H} =p, 
P(N = 2} = P{(T, H)} = 1 — p)p, 
P{N = 3} = P{(T,T, H)} = (1 —p)’p, 


P{N =n} = P{(T,T,...,T, }=(1—p)"'p, an =I 
——_—— 


n—1 
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As a check, note that 


p(Uev=n) = > PIN =n} 
n=1 n=1 


=p) )(-p)"" 
n=1 


= 
1-(—-p) 
= 1 a 


Example 2.4 Suppose that our experiment consists of seeing how long a battery 
can operate before wearing down. Suppose also that we are not primarily inter- 
ested in the actual lifetime of the battery but are concerned only about whether 
or not the battery lasts at least two years. In this case, we may define the random 
variable I by 


r 1, if the lifetime of battery is two or more years 
~ 10, otherwise 


If E denotes the event that the battery lasts two or more years, then the random 
variable I is known as the indicator random variable for event E. (Note that I 
equals 1 or 0 depending on whether or not E occurs.) a 


Example 2.5 Suppose that independent trials, each of which results in any of m 
possible outcomes with respective probabilities p1,..., Pm, )-7~1 Pi = 1, are con- 
tinually performed. Let X denote the number of trials needed until each outcome 
has occurred at least once. 

Rather than directly considering P{X = n} we will first determine P{X > n}, 
the probability that at least one of the outcomes has not yet occurred after 1 
trials. Letting A; denote the event that outcome i has not yet occurred after the 
first 1 trials, i = 1,...,m, then 


P{x >n}=P (U 4 
i=1 
=) P(A) — >) > P(AiA)) 
i=1 


i<j 


+ OVE AiAjAg) — + CDT P(AL- Am) 


i<j<k 
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Now, P(A;) is the probability that each of the first » trials results in a non-i 
outcome, and so by independence 


P(A;j) = (1 — pi)” 


Similarly, P(A;A;) is the probability that the first 7 trials all result in a non-i and 
non-j outcome, and so 


P(A;Aj) = (1 — pi — pj)” 
As all of the other probabilities are similar, we see that 


m 


P{X >n} =) 1-p)"- >>> d-p- 2)” 
i=1 i<j 
+> >° > (d- pi - bj - Pe)” 


i<j<k 


Since P{X = n} = P{X > n—1} — P{X > n}, we see, upon using the algebraic 
identity (1 — a)”"! — (1 — a)” = a(1 — a)", that 


P{X =n} =) p:-p)*'- >> Gi +0)0 -pi- 1)" " 


i=1 i<j 


+ Vo +o) + OA -pi-pj- py" -- 


i<j<k 


In all of the preceding examples, the random variables of interest took on 
either a finite or a countable number of possible values.* Such random variables 
are called discrete. However, there also exist random variables that take on a 
continuum of possible values. These are known as continuous random variables. 
One example is the random variable denoting the lifetime of a car, when the car’s 
lifetime is assumed to take on any value in some interval (a, b). 

The cumulative distribution function (cdf) (or more simply the distribution 
function) F(-) of the random variable X is defined for any real number b, —oo < 
b < &, by 


F(b) = P{X < b} 


In words, F(b) denotes the probability that the random variable X takes on a 
value that is less than or equal to b. Some properties of the cdf F are 


* A set is countable if its elements can be put in a one-to-one correspondence with the sequence of 
positive integers. 
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(i) F(b) is a nondecreasing function of b, 
(ii) limp_.9 F(b) = F(oo) = 1, 
(iii) limy_, 55 F(b) = F(—ov) = 0. 


Property (i) follows since for a < b the event {X < a} is contained in the event 
{X < b}, and so it must have a smaller probability. Properties (ii) and (iii) follow 
since X must take on some finite value. 

All probability questions about X can be answered in terms of the cdf F(-). For 
example, 


Pla < X < b} = F(b) — F(a) for alla < b 


This follows since we may calculate P{a < X < b} by first computing the proba- 
bility that X < b (that is, F(b)) and then subtracting from this the probability 
that X < a (that is, F(a)). 

If we desire the probability that X is strictly smaller than b, we may calculate 
this probability by 


P{X <b} = lim, P{X < b—h} 


= lim F(b —h) 


h>0 


where limy_,9+ means that we are taking the limit as h decreases to 0. Note that 
P{X < b} does not necessarily equal F(b) since F(b) also includes the probability 
that X equals b. 


2.2 Discrete Random Variables 


As was previously mentioned, a random variable that can take on at most a 
countable number of possible values is said to be discrete. For a discrete random 
variable X, we define the probability mass function p(a) of X by 


p(a) = P{X =a} 


The probability mass function p(a) is positive for at most a countable number of 
values of a. That is, if X must assume one of the values x1,x2,..., then 


p(xj) > 0, a ee 


p(x) = 0, all other values of x 


Since X must take on one of the values x;, we have 


yi pep=1 
fb 
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M= olan a 


1 2 3 


Figure 2.1. Graph of F(x). 


The cumulative distribution function F can be expressed in terms of p(a) by 


F(a) = >> p(xi) 


all x;<a 
For instance, suppose X has a probability mass function given by 
pY=3, p2Q=3, pB=% 


then, the cumulative distribution function F of X is given by 


0, a<l 

3, Isa<2 
BON 5 2<a<3 

6? = 

1 3 <a 


v 


This is graphically presented in Figure 2.1. 
Discrete random variables are often classified according to their probability 
mass functions. We now consider some of these random variables. 


2.2.1 The Bernoulli Random Variable 


Suppose that a trial, or an experiment, whose outcome can be classified as either 
a “success” or as a “failure” is performed. If we let X equal 1 if the outcome 
is a success and 0 if it is a failure, then the probability mass function of X is 
given by 


pO) = P{X = 0} =1-p, 


p(1) = P(X = 1) =p ee) 


where p, 0 < p <1, is the probability that the trial is a “success.” 
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A random variable X is said to be a Bernoulli random variable if its probability 
mass function is given by Equation (2.2) for some p€ (0, 1). 


2.2.2 The Binomial Random Variable 


Suppose that m independent trials, each of which results in a “success” with 
probability p and in a “failure” with probability 1 — p, are to be performed. If X 
represents the number of successes that occur in the 7 trials, then X is said to be 
a binomial random variable with parameters (x, p). 

The probability mass function of a binomial random variable having parame- 
ters (7, p) is given by 


p@ = (")r'a =p), i=0,1,..., (2.3) 


where 


n\ _ n! 
(‘) ~ (n—-D!il 


equals the number of different groups of i objects that can be chosen from a set 
of n objects. The validity of Equation (2.3) may be verified by first noting that the 
probability of any particular sequence of the 7 outcomes containing i successes 
and n — i failures is, by the assumed independence of trials, p'(1 — p)”~?. Equa- 
tion (2.3) then follows since there are (’’) different sequences of the 7 outcomes 
leading to i successes and n—i failures. For instance, ifn = 3,i = 2, then there are 
(3) = 3 ways in which the three trials can result in two successes. Namely, any 
one of the three outcomes (s,s,f), (s,f,5), (f,5,5), where the outcome (s,s, f) 
means that the first two trials are successes and the third a failure. Since each 
of the three outcomes (s,s,f), (s,f,5), (f,s,S) has a probability p*(1 — p) of 


occurring the desired probability is thus (3)p?(1 — p). 
Note that, by the binomial theorem, the probabilities sum to one, that is, 


20=>, ("era —py"'=(p+—-p))"=1 
i=0 


i=0 
Example 2.6 Four fair coins are flipped. If the outcomes are assumed indepen- 
dent, what is the probability that two heads and two tails are obtained? 


Solution: Letting X equal the number of heads (“successes”) that appear, 
then X is a binomial random variable with parameters (1 = 4, p = 5): 
Hence, by Equation (2.3), 


me-21-()(2) @)'=2 . 
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Example 2.7. It is known that any item produced by a certain machine will 
be defective with probability 0.1, independently of any other item. What is the 
probability that in a sample of three items, at most one will be defective? 


Solution: If X is the number of defective items in the sample, then X is a bino- 
mial random variable with parameters (3, 0.1). Hence, the desired probability 
is given by 


P{X =0} + P(X =1}= (3) (0.1)9(0.9)? + (;) (0.1)'(0.9)2? = 0.972 @ 


Example 2.8 Suppose that an airplane engine will fail, when in flight, with prob- 
ability 1 — p independently from engine to engine; suppose that the airplane will 
make a successful flight if at least 50 percent of its engines remain operative. For 
what values of p is a four-engine plane preferable to a two-engine plane? 


Solution: Because each engine is assumed to fail or function independently 
of what happens with the other engines, it follows that the number of engines 
remaining operative is a binomial random variable. Hence, the probability that 
a four-engine plane makes a successful flight is 


4 2 .) 4 3 Bs 4 0 
(5)r'a-er + (G)p'a-m + ({)p*a-a 
= 6p" (1—p)? + 4p — p) + p* 


whereas the corresponding probability for a two-engine plane is 
2 2 
(;)oa —p)+ (5) =2p(1—p) +p 


Hence the four-engine plane is safer if 

6p*(1 — py” + 4p? — p) + p* = 2p —p) +p” 
or equivalently if 

6p(1—p)? + 4p°(1-p) +p? >=2-p 
which simplifies to 

3p? — 8p +7p-2>0 or (P—1°Gp-—2)>=0 
which is equivalent to 


3p-2>0 or p>} 
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Hence, the four-engine plane is safer when the engine success probability is at 
least as large as z. whereas the two-engine plane is safer when this probability 
falls below 4. a 


Example 2.9 Suppose that a particular trait of a person (such as eye color 
or left handedness) is classified on the basis of one pair of genes and suppose 
that d represents a dominant gene and r a recessive gene. Thus a person with 
dd genes is pure dominance, one with rr is pure recessive, and one with rd 
is hybrid. The pure dominance and the hybrid are alike in appearance. Chil- 
dren receive one gene from each parent. If, with respect to a particular trait, 
two hybrid parents have a total of four children, what is the probability that 
exactly three of the four children have the outward appearance of the dominant 
gene? 


Solution: If we assume that each child is equally likely to inherit either of two 
genes from each parent, the probabilities that the child of two hybrid parents 
will have dd, rr, or rd pairs of genes are, respectively, a i 5. Hence, because 
an offspring will have the outward appearance of the dominant gene if its gene 
pair is either dd or rd, it follows that the number of such children is binomially 
distributed with parameters (4, 3). Thus the desired probability is 


4 3 1 
eC fe), ee = 
3)\4) \4 64 
Remark on Terminology If X is a binomial random variable with parameters 
(n, p), then we say that X has a binomial distribution with parameters (n, p). 


2.2.3 The Geometric Random Variable 


Suppose that independent trials, each having probability p of being a success, are 
performed until a success occurs. If we let X be the number of trials required 
until the first success, then X is said to be a geometric random variable with 
parameter p. Its probability mass function is given by 


PMShXeHeSd=p 7, wet2as (2.4) 


Equation (2.4) follows since in order for X to equal 7 it is necessary and suf- 
ficient that the first 7 — 1 trials be failures and the mth trial a success. Equa- 
tion (2.4) follows since the outcomes of the successive trials are assumed to be 
independent. 

To check that p(”) is a probability mass function, we note that 


Yo) =p dp) t =1 
n=1 n=1 
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2.2.4 The Poisson Random Variable 


A random variable X, taking on one of the values 0,1,2,..., is said to be a 
Poisson random variable with parameter A, if for some A > 0, 


pO) =PIX=aeZ, i=0,1,... (2.5) 


Equation (2.5) defines a probability mass function since 
00 OO i 
Y > pli) =e* > wo ere =1 
i=0 i=0 


The Poisson random variable has a wide range of applications in a diverse number 
of areas, as will be seen in Chapter 5. 

An important property of the Poisson random variable is that it may be used 
to approximate a binomial random variable when the binomial parameter 7 is 
large and p is small. To see this, suppose that X is a binomial random variable 
with parameters (”, p), and let A = np. Then 


P{X =i} = pi(l—p)”* 


n\ 
(n —1)\1! 


- a3) (0-3 
~ (n—ili! \n n 


_ n(n 1)-+-(m—i +1) a! A= A/n)" 
a ni i! (1—A/n)i 


Now, for 1 large and p small 


is Zeke geet va 
(1-3) Tg a ed a Se (1-2) =1 


n n' 


Hence, for large and p small, 
i 


r 
P{X =i} & ae: 


Example 2.10 Suppose that the number of typographical errors on a single page 
of this book has a Poisson distribution with parameter 4 = 1. Calculate the 
probability that there is at least one error on this page. 


Solution: 


P{X > 1} =1-—P{X =0} =1-e7! + 0.633 | 
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Example 2.11 If the number of accidents occurring on a highway each day is a 
Poisson random variable with parameter 4 = 3, what is the probability that no 
accidents occur today? 


Solution: 
P{X = 0} =e? © 0.05 | 


Example 2.12 Consider an experiment that consists of counting the number of 
a-particles given off in a one-second interval by one gram of radioactive material. 
If we know from past experience that, on the average, 3.2 such a-particles are 
given off, what is a good approximation to the probability that no more than 
two a-particles will appear? 


Solution: If we think of the gram of radioactive material as consisting of a 
large number n of atoms each of which has probability 3.2/1 of disintegrat- 
ing and sending off an a-particle during the second considered, then we see 
that, to a very close approximation, the number of a-particles given off will 
be a Poisson random variable with parameter 1 = 3.2. Hence the desired 
probability is 


2 x (1.382 r 


3.2)? 
P{X <2} = tae on ae aes ee 


2.3 Continuous Random Variables 


In this section, we shall concern ourselves with random variables whose set of 
possible values is uncountable. Let X be such a random variable. We say that 
X is a continuous random variable if there exists a nonnegative function f(x), 
defined for all real x € (—oo, 00), having the property that for any set B of real 
numbers 


P{X ¢ B} = [fo dx (2.6) 
B 


The function f(x) is called the probability density function of the random vari- 
able X. 

In words, Equation (2.6) states that the probability that X will be in B may be 
obtained by integrating the probability density function over the set B. Since X 
must assume some value, f(x) must satisfy 


1 = P{X € (—00, 00)} = oe f(x) dx 
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All probability statements about X can be answered in terms of f(x). For instance, 
letting B = [a, b], we obtain from Equation (2.6) that 


b 
Pla<X<b}= / f(x) dx (2.7) 
If we let a = b in the preceding, then 
P{X =a}= if f(x) dx = 0 


In words, this equation states that the probability that a continuous random 
variable will assume any particular value is zero. 

The relationship between the cumulative distribution F(-) and the probability 
density f(-) is expressed by 


F(a) = P{X € (-~, a]} = i f(x) dx 


Differentiating both sides of the preceding yields 
d 
a = f(a) 


That is, the density is the derivative of the cumulative distribution function. 
A somewhat more intuitive interpretation of the density function may be obtained 
from Equation (2.7) as follows: 


a+e/2 
P{a-£<X<a+ “|= f(x) dx ~ ef (a) 
2 a—é/2 


when ¢ is small. In other words, the probability that X will be contained in an 
interval of length ¢ around the point a is approximately ef (a). From this, we see 
that f(a) is a measure of how likely it is that the random variable will be near a. 

There are several important continuous random variables that appear fre- 
quently in probability theory. The remainder of this section is devoted to a study 
of certain of these random variables. 


2.3.1 The Uniform Random Variable 


A random variable is said to be uniformly distributed over the interval (0,1) if 
its probability density function is given by 


foo = {9 0O< x <1 


; otherwise 
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Note that the preceding is a density function since f(x) > 0 and 


lee) 1 
/ foods = f dx=1 
—co 0 


Since f(x) > 0 only when x € (0,1), it follows that X must assume a value in 
(0, 1). Also, since f (x) is constant for x € (0, 1), X is just as likely to be “near” any 
value in (0, 1) as any other value. To check this, note that, forany0 <a <b <1, 


b 
PasX<b)= | Hide= beg 


In other words, the probability that X is in any particular subinterval of (0, 1) 
equals the length of that subinterval. 

In general, we say that X is a uniform random variable on the interval (a, f) if 
its probability density function is given by 


pane ifa<x <p 
f(x) = {7-2 (2.8) 


0, otherwise 


Example 2.13 Calculate the cumulative distribution function of a random vari- 
able uniformly distributed over (a, f). 


Solution: Since F(a) = fe, f(x) dx, we obtain from Equation (2.8) that 


0, axa 
a-a 
F(a) = =a a<a<f 
1, a=B a 


Example 2.14 If X is uniformly distributed over (0, 10), calculate the probability 
that (a) X < 3, (b) X > 7, (c)1< X <6. 


Solution: 
i ae 3 
P{X <3}= = 
eS ais ge 
10 
d. 
Pes ye 2 = 
10 10 
6 
d 1 
pa aE = 
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2.3.2 Exponential Random Variables 
A continuous random variable whose probability density function is given, for 


some A > 0, by 


he*, ifx>0 


foo = {4 if x <0 


is said to be an exponential random variable with parameter 4. These random 
variables will be extensively studied in Chapter 5, so we will content ourselves 
here with just calculating the cumulative distribution function F: 


a 
F(a) = i here te, a>0 
0 
Note that F(oo) = les he** dx = 1, as, of course, it must. 
2.3.3 Gamma Random Variables 
A continuous random variable whose density is given by 
Ae Ga" 


f(x) = Tey 


0, if x < 0 


ifx>0 


for some 4 > 0, a > 0 is said to be a gamma random variable with parameters 
a, A. The quantity ['(q) is called the gamma function and is defined by 


Ta) =} e* x! dx 
0 


It is easy to show by induction that for integral aw, say, a = 1, 


I(n) = (n— 1)! 


2.3.4 Normal Random Variables 


We say that X is a normal random variable (or simply that X is normally 
distributed) with parameters j: and o7 if the density of X is given by 


1 
f(x) = So gw ae -—0 <x <0 
Oo 


This density function is a bell-shaped curve that is symmetric around wp (see 
Figure 2.2). 
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+ -x% u $oo 


Figure 2.2. Normal density function. 


An important fact about normal random variables is that if X is normally 
distributed with parameters jz and o* then Y = aX + is normally distributed 
with parameters a + 6 and ao. To prove this, suppose first that a > 0 and 
note that Fy(.)*, the cumulative distribution function of the random variable Y, 
is given by 


Fy(a) = P{Y < a} 


= ij SOND pew ae 
—oco 21 0 


ees —(v — (ap + B))* 
i i J 27 ao ee) | 20202 | ae eo 


where the last equality is obtained by the change in variables v = ax +4 £8. 
However, since Fy(a) = fie fy(v) dv, it follows from Equation (2.9) that the 
probability density function fy(-) is given by 


1p | UH + AY 
nage 2(ao)2 


fyv) = 


r —-woO<VU< © 


Hence, Y is normally distributed with parameters a + 6 and (ao). A similar 
result is also true when a < 0. 


* When there is more than one random variable under consideration, we shall denote the cumulative 
distribution function of a random variable Z by F,(-). Similarly, we shall denote the density of Z 


by f(-). 
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One implication of the preceding result is that if X is normally distributed with 
parameters w and o7 then Y = (X—)/o is normally distributed with parameters 
0 and 1. Such a random variable Y is said to have the standard or unit normal 
distribution. 


2.4 Expectation of a Random Variable 


2.4.1 The Discrete Case 


If X is a discrete random variable having a probability mass function p(x), then 
the expected value of X is defined by 


BS: Ys cape 
x:p(x)>0 


In other words, the expected value of X is a weighted average of the possible 


values that X can take on, each value being weighted by the probability that X 
assumes that value. For example, if the probability mass function of X is given by 


p(l) = $ =p) 
then 
E[X] = 1(5) + 25) = 3 


is just an ordinary average of the two possible values 1 and 2 that X can assume. 
On the other hand, if 


pPU=3, p2)=3 
then 
E[X] = 165) + 23) = 3 


is a weighted average of the two possible values 1 and 2 where the value 2 is 
given twice as much weight as the value 1 since p(2) = 2p(1). 


Example 2.15 Find E[X] where X is the outcome when we roll a fair die. 


Solution: Since p(1) = p(2) = p(3) = p(4) = p(S) = p(6) = Z, we obtain 


E[X] = 1(2) + 24) + 34) + 40) + 5) + 6) = F a 


Example 2.16 (Expectation of a Bernoulli Random Variable) Calculate E[X] 
when X is a Bernoulli random variable with parameter p. 
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Solution: Since p(0) = 1 — p, p(1) = p, we have 
E[X] = 011 — p) + 1) =p 


Thus, the expected number of successes in a single trial is just the probability 
that the trial will be a success. a 


Example 2.17 (Expectation of a Binomial Random Variable) Calculate E[X] 
when X is binomially distributed with parameters n and p. 


Solution: 


n 


E[X] =) ip(i) 


i=0 


=i ("r'a apy 

i=0 : 

: in! i ni 
2 Galil? G2) 


Z n! 


=@engepe 


i=1 


_ (n— 1)! | ie 
= @-pla—-pe o-?P) 


n—1 
=a 
=np>- (" ; )pka — pyr tk 
k=0 


=nplp + (1— py" 
— np 
where the second from the last equality follows by letting k = i — 1. Thus, 


the expected number of successes in 1 independent trials is m multiplied by the 
probability that a trial results in a success. 


Example 2.18 (Expectation of a Geometric Random Variable) Calculate the 
expectation of a geometric random variable having parameter p. 


Solution: By Equation (2.4), we have 


CO 


E[X] =) mpd — py" 


n=1 


[o,@) 
=p donq’ 
n=1 
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where gq =1-—/f, 


In words, the expected number of independent trials we need to perform until 
we attain our first success equals the reciprocal of the probability that any one 
trial results in a success. a 


Example 2.19 (Expectation of a Poisson Random Variable) Calculate E[X] if X 
is a Poisson random variable with parameter A. 


Solution: From Equation (2.5), we have 


oe 
=he- 
aay 
i=1 
CO yk 
Xr 
a 
k=0 
= re*e* 
where we have used the identity )"?°.9 A*/k! = e*. a 


2.4.2 The Continuous Case 


We may also define the expected value of a continuous random variable. This 
is done as follows. If X is a continuous random variable having a probability 
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density function f(x), then the expected value of X is defined by 


E[X] = [ xf (x) dx 


Example 2.20 (Expectation of a Uniform Random Variable) Calculate the expec- 
tation of a random variable uniformly distributed over (a, B). 


Solution: From Equation (2.8) we have 


x 


B 
Fixi= [ gaa 
B* — a 


~ 2(B—a) 
_ bra 
eg, 


In other words, the expected value of a random variable uniformly distributed 
over the interval (a, 8) is just the midpoint of the interval. 5 


Example 2.21 (Expectation of an Exponential Random Variable) Let X be expo- 
nentially distributed with parameter A. Calculate E[X]. 
Solution: 
[o,@) 
E[X] = xae* dx 
0 


Integrating by parts (dv = rAe~**, u = x) yields 


lee) 
E[X] = —xe*|° + / e* dx 
0 


—Xx |0O 
e 
ii 
d Io 
1 
oN |_| 
Xr 


Example 2.22 (Expectation of a Normal Random Variable) Calculate E[X] when 
X is normally distributed with parameters jw and o?. 


Solution: 


E[X] = 1 im Oe ate cas dee 
V 210 —0o 
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Writing x as (x — w) + uw yields 


E[X] = oe ee ~ pew &-HY [20° dx + : i¢ eH)? [20° ay 
V 210 J-0o V 210 Joo 


Letting y = x — yp leads to 


1 ies 2/92 a 
E[X] = ear | / d 
[X] a= | y+ Pe Xx 


where f(x) is the normal density. By symmetry, the first integral must be 0, 
and so 


EX] =n [ f(x) dx = a 


2.4.3 Expectation of a Function of a Random Variable 


Suppose now that we are given a random variable X and its probability distri- 
bution (that is, its probability mass function in the discrete case or its probability 
density function in the continuous case). Suppose also that we are interested in 
calculating not the expected value of X, but the expected value of some function 
of X, say, g(X). How do we go about doing this? One way is as follows. Since 
g(X) is itself a random variable, it must have a probability distribution, which 
should be computable from a knowledge of the distribution of X. Once we have 
obtained the distribution of g(X), we can then compute E[g(X)] by the definition 
of the expectation. 


Example 2.23 Suppose X has the following probability mass function: 
p0)=0.2, pd)=0.5, p(2)=0.3 
Calculate E[X?]. 


Solution: Letting Y = X?, we have that Y is a random variable that can take 
on one of the values 07, 17, 2* with respective probabilities 


py(0) = P{Y = 07} = 0.2, 
py(1) = P{Y = 17} = 0.5, 
py (4) = P{Y = 27} =0.3 


Hence, 


E[X?] = E[Y] = 0(0.2) + 1(0.5) + 4(0.3) = 1.7 
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Note that 
1.7 = E[X*] # (E[X])? = 1.21 | 


Example 2.24 Let X be uniformly distributed over (0,1). Calculate E[X%]. 


Solution: Letting Y = X°%, we calculate the distribution of Y as follows. For 
0<a<il, 


Fy(a) = P{Y <4} 
= P{X? <a} 
= P{X < a'/3} 
1/3 


=a 


where the last equality follows since X is uniformly distributed over (0, 1). By 
differentiating Fy(a), we obtain the density of Y, namely, 


fr@=fa7?, OK<a<l 


Hence, 


E[X?] = E[Y] = ‘ia afy(a) da 


ll 
oS) 
on” 

an 
Q 
an 
Bren 
ae) 
= 


1 
3 
—,, Hl! 
=i r 


While the foregoing procedure will, in theory, always enable us to compute 
the expectation of any function of X from a knowledge of the distribution of 
X, there is, fortunately, an easier way to do this. The following proposition 
shows how we can calculate the expectation of g(X) without first determining 
its distribution. 


Proposition 2.1 (a) If X is a discrete random variable with probability mass 
function p(x), then for any real-valued function g, 


Eig(X)1= D> g(x)p(x) 
x:p(x)>0 
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(b) If X is a continuous random variable with probability density function f (x), 
then for any real-valued function g, 


Elg(X)] = i g(x)f (x) dx a 


Example 2.25 Applying the proposition to Example 2.23 yields 
E[X*] = 070.2) + (17)(0.5) + (27)(0.3) = 1.7 


which, of course, checks with the result derived in Example 2.23. | 


Example 2.26 Applying the proposition to Example 2.24 yields 


1 
E[X?] = / x dx (since f(x) = 1,0 <x <1) 
0 
al 
A simple corollary of Proposition 2.1 is the following. 


Corollary 2.2 If a and 6 are constants, then 
E[aX + 6] = aE[X] +b 
Proof. In the discrete case, 


ElaX + b]= > (ax + b)p(x) 


x:p(x)>0 
=a >> xp(x)th D> px) 
x:p(x)>0 x:p(x)>0 
= aE[X]+ 6 


In the continuous case, 
E[aX + b] = / (ax + b)f (x) dx 


=a" xfeaydxtb Papas 


= aE[X] + i 


The expected value of a random variable X, E[X], is also referred to as the mean 
or the first »z0ment of X. The quantity E[X”], m > 1, is called the mth moment 
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of X. By Proposition 2.1, we note that 


> x" p(x), if X is discrete 


E[X"] = x:p(x)>0 


lee) 
/ x"f (x) dx, if X is continuous 
—0o 


Another quantity of interest is the variance of a random variable X, denoted 
by Var(X), which is defined by 
Var(X) = E[(X — E[X])7] 


Thus, the variance of X measures the expected square of the deviation of X from 
its expected value. 


Example 2.27 (Variance of the Normal Random Variable) Let X be normally 
distributed with parameters jz and o*. Find Var(X). 


Solution: Recalling (see Example 2.22) that ELX] = 1, we have that 
Var(X) = E[(X — #)7] 
1 oe 21942 
= (x — re) [20° dye 
Vv 210 ie . 
Substituting y = (x — )/o yields 
o* 99 2 
Var(X) = | ze 12 J 
V 20 J—co ? ; 


Integrating by parts (uv = y, dv = ye» /2dy) gives 


2 foe) 
Var(X) = Gs (ve + i en /2 dy) 


—oo 
ot / re 2 
=— ey ld 
V20 J—oo i 
— o 
Another derivation of Var(X) will be given in Example 2.42. a 


Suppose that X is continuous with density f, and let E[X] = w. Then, 


Var(X) = E[(X — p)*] 
= E[X? = 2p +n" 1 
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= is (x* — 2ux + 7) f(x) dx 


=f xf (x) dx—2u f xfs dx + 12 | f(x) dx 


ee = 
= E[X*]-2yp t+ pw? 
= E[X?] — pu? 


A similar proof holds in the discrete case, and so we obtain the useful identity 
Var(X) = E[X*] — (E[X])? 


Example 2.28 Calculate Var(X) when X represents the outcome when a fair die 
is rolled. 


Solution: As previously noted in Example 2.15, ELX] = 5. Also, 


Hence, 


Var(X) = 24 - (3) = 


yuo 
we 
| 


2.5 Jointly Distributed Random Variables 


2.5.1 Joint Distribution Functions 


Thus far, we have concerned ourselves with the probability distribution ofa single 
random variable. However, we are often interested in probability statements con- 
cerning two or more random variables. To deal with such probabilities, we define, 
for any two random variables X and Y, the joint cumulative probability distri- 
bution function of X and Y by 


F(a, b) = P{X <a, Y < b}, —0o0 <a,b <0 


The distribution of X can be obtained from the joint distribution of X and Y as 
follows: 


Fx (a) = P{X < a} 
= P{X <a, Y < ~} 
= F(a, co) 
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Similarly, the cumulative distribution function of Y is given by 
Fy(b) = P{Y < b} = F(oo, b) 


In the case where X and Y are both discrete random variables, it is convenient to 
define the joint probability mass function of X and Y by 


p(x, y) =P{xX =x, Y=y} 
The probability mass function of X may be obtained from p(x, y) by 


px(x)= >) plx,y) 


y:p(x,y)>0 
Similarly, 


pyo= DD) plx,y) 


xip(x,y)>0 


We say that X and Y are jointly continuous if there exists a function f(x,y), 
defined for all real x and y, having the property that for all sets A and B of real 
numbers 


pixeAveB)=| | faydedy 
BJA 


The function f(x,y) is called the joint probability density function of X and Y. 
The probability density of X can be obtained from a knowledge of f(x, y) by the 
following reasoning: 


P{xX € A} = P{X EA, YE (—c, co)} 
=| | fosndeay 
-—oJA 
= | feo de 
A 
where 


fx (x) = i f (x,y) dy 


is thus the probability density function of X. Similarly, the probability density 
function of Y is given by 


fy) = il f (x,y) dx 


46 Random Variables 


Because 
a b 
Fab)=PXsavsb=f | flx,»dydx 
differentiation yields 


d? 
——F 
ab (a, b) = f(a, b) 
Thus, as in the single variable case, differentiating the probability distribution 
function gives the probability density function. 


A variation of Proposition 2.1 states that if X and Y are random variables and 
g is a function of two variables, then 


E[g(X, Y)] =e oy 2(x, vy) p(x, y) in the discrete case 


y x 


lo) lee) 
= / i g(x, y)f (x, y) dx dy in the continuous case 
—COvd —C 


For example, if g(X, Y) = X + Y, then, in the continuous case, 


E[X + Y]= [ [+ nfeondxay 


= ie CROC eres ibe [ iG) ded 


= E[X] + E[Y] 


where the first integral is evaluated by using the variation of Proposition 2.1 with 
g(x,y) = x, and the second with g(x, y) = y. 

The same result holds in the discrete case and, combined with the corollary in 
Section 2.4.3, yields that for any constants a, b 


ElaX + bY] = aE[X] + bE[LY] (2.10) 


Joint probability distributions may also be defined for m random variables. 
The details are exactly the same as when m = 2 and are left as an exercise. The 
corresponding result to Equation (2.10) states that if X1, X2,...,X,arenrandom 
variables, then for any n constants 41, 42,...,4n, 


ElayX1 + a2X2 +--+ + anXn] = a, E[Xq] + a2 E[X2] +--+ + anE[Xy] 
(2.11) 
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Example 2.29 Calculate the expected sum obtained when three fair dice are 
rolled. 


Solution: Let X denote the sum obtained. Then X = X;, + X2 + X3 where 
X; represents the value of the ith die. Thus, 


E[X] = E[Xi] + EX2] + ELX3] = 3 (3) = 4 . 


Example 2.30 As another example of the usefulness of Equation (2.11), let us 
use it to obtain the expectation of a binomial random variable having parameters 
n and p. Recalling that such a random variable X represents the number of 
successes in 1 trials when each trial has probability p of being a success, we 
have 


X=Xy~t+ X24+--- +X, 


where 


xX 1, if the ith trial is a success 
‘1/0, if the ith trial is a failure 


Hence, X; is a Bernoulli random variable having expectation E[X;] = 1(p) + 


0(1 — p) = p. Thus, 
ELX] = E[X4] + E[X2] + --- + E[X,] = np 


This derivation should be compared with the one presented in Example 2.17. 


Example 2.31 Ata party N men throw their hats into the center of a room. The 
hats are mixed up and each man randomly selects one. Find the expected number 
of men who select their own hats. 


Solution: Letting X denote the number of men that select their own hats, we 
can best compute E[X] by noting that 


X= X,+X2+-:-+ Xn 


where 


X= 1, if the ith man selects his own hat 
‘10, otherwise 


Now, because the ith man is equally likely to select any of the N hats, it follows 
that 


1 
P{X; = 1} = P{ith man selects his own hat} = N 
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and so 


1 
E{[X;] = 1P{X; = 1} + OP{X; = 0} = N 
Hence, from Equation (2.11) we obtain 


E[X] = E[Xi] + --- + E[Xy] = (s)N =1 


Hence, no matter how many people are at the party, on the average exactly 
one of the men will select his own hat. | 


Example 2.32 Suppose there are 25 different types of coupons and suppose that 
each time one obtains a coupon, it is equally likely to be any one of the 25 types. 
Compute the expected number of different types that are contained in a set of 10 
coupons. 


Solution: Let X denote the number of different types in the set of 10 coupons. 
We compute E[X] by using the representation 


X= Xp +--+ + X25 


where 


X= 1, if at least one type i coupon is in the set of 10 
‘10, otherwise 


Now, 


E[X;] = P{X; = 1} 
= P{at least one type i coupon is in the set of 10} 


= 1 — P{no type i coupons are in the set of 10} 


when the last equality follows since each of the 10 coupons will (independently) 


not be a type i with probability 5. Hence, 


E[X] = ELX1] + --- + E[Xa5] = 25[1-(#)""] 7 


2.5.2 Independent Random Variables 


The random variables X and Y are said to be independent if, for all a, b, 


P{X <a, Y < b} = P{X < a}P{Y < D} (2.12) 
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In other words, X and Y are independent if, for all a and b, the events Eg = {X <a} 
and Fy = {Y < b} are independent. 

In terms of the joint distribution function F of X and Y, we have that X and 
Y are independent if 


F(a, b) = Fx(a)Fy(b) for all a, b 
When X and Y are discrete, the condition of independence reduces to 
P(x, y) = px (x)py(y) (2.13) 


while if X and Y are jointly continuous, independence reduces to 
f(x, 9) = fx @) fro) (2.14) 


To prove this statement, consider first the discrete version, and suppose that the 
joint probability mass function p(x, y) satisfies Equation (2.13). Then 


P{X <a, Y<b}=))) pt y) 


y<b x<a 


=) >> opx@pyo 


y<b x<a 


=) > py0) ) > px) 


y<b x<a 
= P{Y < b}P{X <a} 
and so X and Y are independent. That Equation (2.14) implies independence in 


the continuous case is proven in the same manner and is left as an exercise. 
An important result concerning independence is the following. 


Proposition 2.3 If X and Y are independent, then for any functions h and g 


ElgQn(Y)] = ElgQolela(y)] 
Proof. Suppose that X and Y are jointly continuous. Then 
Bgcoh = ff scononfes dx dy 
= / i g(x)h(y)fx (x) fy (y) dx dy 


= i h(y)fy (y) dy / g(x) fx (x) dx 


= E[h(Y)JE[g(X)] 


The proof in the discrete case is similar. a 
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2.5.3 Covariance and Variance of Sums of Random Variables 


The covariance of any two random variables X and Y, denoted by Cov(X, Y), is 


defined by 


Cov(X, Y) = E[(X — E[X])(Y — E[Y])] 
XY — YE[X] — XE[Y] + ELX]EfY]] 


[ 
[ 
[ 
[ 


Note that if X and Y are independent, then by Proposition 2.3 it follows that 
Cov(X, Y) = 0. 

Let us consider now the special case where X and Y are indicator variables 
for whether or not the events A and B occur. That is, for events A and B, 


define 


xe 1, if A occurs y- 1, if B occurs 
~ 10, otherwise, ~ 10, otherwise 


Cov(X, Y) = E[XY] — E[X]ELY] 


and, because XY will equal 1 or 0 depending on whether or not both X and Y 
equal 1, we see that 


Cov(X, Y) = P{X =1, Y = 1} — P(X = 1)P(Y=1} 
From this we see that 


Cov(X, Y) >0 6 P{X =1,Y=1}>P{X=P{Y=1} 
PIX =41,7 S43 

PIx=1j 
& PLY =1|X=1} > P(Y=1} 


> PLY =1} 


That is, the covariance of X and Y is positive if the outcome X = 1 makes it 
more likely that Y = 1 (which, as is easily seen by symmetry, also implies the 
reverse). 

In general it can be shown that a positive value of Cov(X, Y) is an indication 
that Y tends to increase as X does, whereas a negative value indicates that Y 
tends to decrease as X increases. 
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Example 2.33 The joint density function of X, Y is 


1 
cs y) aad =e OMY, 0< x,y < 00 
y 


(a) Verify that the preceding is a joint density function. 
(b) Find Cov (X, Y). 


Solution: To show that f(x, y) is a joint density function we need to show it 
is nonnegative, which is immediate, and that [°. [°. f(x, y)dydx = 1. We 
prove the latter as follows: 


i f(x, y)dy dx = / / 1 -0+8/)dy He 
ee 5 alg ea 
=f any 1 Abe dy 
0 0 y 


= / e ‘dy 
0 


=1 


To obtain Cov(X, Y), note that the density funtion of Y is 
oe 
poner |" Lemna? 
0 Yy 


Thus, Y is an exponential random variable with parameter 1, showing (see 
Example 2.21) that 


E[Y]=1 


We compute E[X] and E[XY] as follows: 


E[X] = / = / "Gig bide 


(oe) [oe x 
=i ef —e—*/V dx dy 
0 0 »Y 


Now, f° ne Ydx is the expected value of an exponential random variable 
with parameter 1/y, and thus is equal to y. Consequently, 


E[X] = / ye %dy=1 
0 
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Also 


E[XY] = / ‘i i = i bay Nei 


(oe) CO x 
= / ye [ —e—*/) dx dy 
0 o Yy 


[o-e) 
= / ye dy 
0 


Integration by parts (dv = e~Y dy, u = y) gives 
[o,@) pe [o,@) 
E[XY]= / ye dy = ye lo 4 i 2yeYdy = 2EL[Y] =2 
0 0 


Consequently, 


Cov(X, Y) = E[XY] — E[X]E[Y] = 1 a 


The following are important properties of covariance. 


Properties of Covariance 
For any random variables X, Y, Z and constant c, 


1. Cov(X, X) = Var(X), 

2. Cov(X, Y) = Cov(Y,X), 

3. Cov(cX, Y) = cCov(X, Y), 

4. Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z). 


Whereas the first three properties are immediate, the final one is easily proven 
as follows: 


Cov(X, Y + Z) = E[X(Y + Z)] — ELX]ELY + Z] 
= E[XY] — E[X]E[Y] + E[XZ] — E[X]E[Z] 
= Cov(X, Y) + Cov(X, Z) 


The fourth property listed easily generalizes to give the following result: 
n m n m 
Cov | >> Xi, 9° ¥ | = >) >) CovcX, ¥)) (2.15) 
i=1 j=l i=1 j=1 


A useful expression for the variance of the sum of random variables can be 
obtained from Equation (2.15) as follows: 


Var (> x) = Cov ys Xi, yy x] 
i=1 i=1 j=1 
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= a 2 Cov(Xi, Xj) 


i=1 j=1 
n n 
=) Cov(Xj, X;) + 95 5) Cov(X;, Xj) 
i=1 i=1 j#i 
n n 
= )/Var(X;) + 25° 5° Cov(Xj, Xj) (2.16) 
i=1 i=1 j<i 
If X;,i = 1,...,” are independent random variables, then Equation (2.16) 
reduces to 


Var p x; = Y > Var(X;) 
j= 


i=1 
Definition 2.1 If Xi,...,X» are independent and identically distributed, then 
the random variable X = )~"_, X;/7 is called the sample mean. 


The following proposition shows that the covariance between the sample mean 
and a deviation from that sample mean is zero. It will be needed in Section 2.6.1. 


Proposition 2.4 Suppose that Xj,...,X,, are independent and identically dis- 
tributed with expected value w and variance o7. Then, 


(a) E[X] =u. 
(b) Var(X) = o*/n. 
(c) Cov(X, X;-— X)=0, i=1,...,n. 


Proof. Parts (a) and (b) are easily established as follows: 


zZ ty i (ere o 
oad be Sj of= a 
Var(X) (=) Var dX (<) DENa ) ; 
To establish part (c) we reason as follows: 
Cov(X, X; — X) = Cov(X, X;) — Cov(X, X) 
1 - 
= —Cov{ X; KX \ = Vail 
“Cov( + X j ) Var(X) 


1 1 
= —Cov(Xj, Xi) + mad D> Xj, xi) ye 
Fl 
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where the final equality used the fact that X; and }?;,; X; are independent and 
thus have covariance 0. a 


Equation (2.16) is often useful when computing variances. 


Example 2.34 (Variance of a Binomial Random Variable) Compute the variance 
of a binomial random variable X with parameters n and p. 


Solution: Since such a random variable represents the number of successes in 
n independent trials when each trial has a common probability p of being a 
success, we may write 

You 6.6 ee 


where the X; are independent Bernoulli random variables such that 


1, if the ith trial is a success 
Xj= 0 h : 
: otherwise 


Hence, from Equation (2.16) we obtain 


Var(X) = Var(X1) + --- + Var(X,) 


But 
Var(X;) = ELX}] — (E[Xj])” 
= E[X;]— (E[X;)* __ since X? = X; 
=p-p 
and thus 
Var(X) = np(1 — p) | 


Example 2.35 (Sampling from a Finite Population: The Hypergeometric) Con- 
sider a population of N individuals, some of whom are in favor of a certain 
proposition. In particular suppose that Np of them are in favor and N — Np are 
opposed, where p is assumed to be unknown. We are interested in estimating p, 
the fraction of the population that is for the proposition, by randomly choosing 
and then determining the positions of 7 members of the population. 

In such situations as described in the preceding, it is common to use the fraction 
of the sampled population that is in favor of the proposition as an estimator of 
p. Hence, if we let 


X= 1, if the ith person chosen is in favor 
7 10, otherwise 
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then the usual estimator of p is }“/_, X;/n. Let us now compute its mean and 
variance. Now, 


E b» x = \> ELXi] 
i=1 1 
— np 


where the final equality follows since the ith person chosen is equally likely to be 
any of the N individuals in the population and so has probability Np/N of being 
in favor. 


Var (» x) = Y > Var(Xi) + 2 Y “Cov(Xj, Xj) 
1 1 i<j 
Now, since X; is a Bernoulli random variable with mean p, it follows that 
Var(X;) = pC — p) 
Also, for i 4 j, 
Cov(X;, Xj) = E[X;X;j] — E[Xj]E[X;] 
= P{X; = 1,X; = 1}—p* 
= P(X; = 1)P(X;=1| X; = 1}-p* 


_NeWe-0) 
SN NST 


where the last equality follows since if the ith person to be chosen is in favor, 
then the jth person chosen is equally likely to be any of the other N — 1 of which 
Np — 1 are in favor. Thus, we see that 


n\[p(Np-1) 5 
Var X; = npc - p) +2(5) [POP - 22] 
(» p) N-1 


n(n — 1)p — p) 
N-1 


= np(1 — p) 


and so the mean and variance of our estimator are given by 
n 
Xj 
E be rad li 


“.X;|_ pd-p) (-1)pd—-p) 
cr bp “|= n n(N — 1) 


56 Random Variables 


Some remarks are in order: As the mean of the estimator is the unknown value p, 
we would like its variance to be as small as possible (why is this?), and we see by 
the preceding that, as a function of the population size N, the variance increases 
as N increases. The limiting value, as N — oo, of the variance is p(1 — p)/n, 
which is not surprising since for N large each of the X; will be (approximately) 
independent random variables, and thus }°7 X; will have an (approximately) 
binomial distribution with parameters 1 and p. 

The random variable }°7 X; can be thought of as representing the number 
of white balls obtained when x balls are randomly selected from a population 
consisting of Np white and N — Np black balls. (Identify a person who favors the 
proposition with a white ball and one against with a black ball.) Such a random 
variable is called hypergeometric and has a probability mass function given by 


Coley 
és k n—k 
P Xj=kp= | 
[x4] =A 
n 

It is often important to be able to calculate the distribution of X + Y from the 
distributions of X and Y when X and Y are independent. Suppose first that X 
and Y are continuous, X having probability density f and Y having probability 


density g. Then, letting Fx4y(a) be the cumulative distribution function of X + Y, 
we have 


Fx,y(a) = P{X + Y <a} 
= il i} f(x)g(y) dx dy 
x+y<a 


co pa-y 
= i / f (x)g(y) dx dy 


oe) a-y 
=i) (/ fla) de) gto) dy 


Cc 
=i Fx (a — y)g(y) dy (2.17) 
—Co 
The cumulative distribution function Fx+/¥ is called the convolution of the distri- 
butions Fy and Fy (the cumulative distribution functions of X and Y, respec- 
tively). 

By differentiating Equation (2.17), we obtain that the probability density func- 
tion fx+y(a) of X + Y is given by 


d (oe) 
fxsv(a) = = a _Fx(a— yg) dy 
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od 
= i: aa (Fx (a — y))g(y) dy 
a 


—0oO 


a 1 fa—y)g(y) dy (2.18) 


Example 2.36 (Sum of Two Independent Uniform Random Variables) If X and 
Y are independent random variables both uniformly distributed on (0, 1), then 
calculate the probability density of X + Y. 


Solution: From Equation (2.18), since 


0O<a<il1 
otherwise 


fa) =80a) = |) 
we obtain 


1 
fx+y (a) =f f(a—y) dy 


For 0 < a < 1, this yields 


frvvia) = f dy=a 


For 1 <a < 2, we get 


1 
frrvia = [ See 


Hence, 
a, 0<a<1i1 
fxty(@) = 2-4, 1<a<2 
0, otherwise P| 


Rather than deriving a general expression for the distribution of X + Y in the 
discrete case, we shall consider an example. 


Example 2.37 (Sums of Independent Poisson Random Variables) Let X and Y be 
independent Poisson random variables with respective means A; and A. Calculate 
the distribution of X + Y. 


Solution: Since the event {X + Y = n} may be written as the union of the 
disjoint events {KX =k, Y=n—k}, 0< k <n, we have 
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P(X + Yen) = Pix =k Y=n—-k} 
k=0 


=) P{X = k}P{Y =n-k} 


k=0 
—k 
= -> eo 1 Mi ek yi 
Bie (n—k)! 
n ky n—k 
o-(a+ha) cer ae 
“En 6 —k)! 
e e A1+A2) n! Keno 
~ a! as 
aru ! 
eta) 
= (Aq + Az)” 
In words, X, + X2 has a Poisson distribution with mean A, + Ap. | 


The concept of independence may, of course, be defined for more than two 
random variables. In general, the 7 random variables X1, X2,..., Xn are said 
to be independent if, for all values a1, a2,..., an; 


P{X1 S a1,X2 < a2,...,Xn < An} = P{Xq < ay} P{X2 < ag}--- P{Xn < ay} 
Example 2.38 Let X1,...,X, be independent and identically distributed 
continuous random variables with probability distribution F and density func- 
tion F’ = f. If we let X(j) denote the ith smallest of these random variables, then 
X(1)5--+5Xm) are called the order statistics. To obtain the distribution of X,j), 


note that Xj) will be less than or equal to x if and only if at least i of the 7 random 
variables X1,..., X,, are less than or equal to x. Hence, 


PIX@ sx}= >) (i) (F(x))*d — Fee)" 
kei 
Differentiation yields that the density function of X(j) is as follows: 
pCa T@ > (i Jere ‘(= F(x))"* 


— f@) 3 (1) (n — k)(FO)KA — F@yy"*4 
kai 
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L ! 
=1@D alll — Fey 


k n—k— 
“10 on (1 — F(x)" *" 
= n} k-1¢4 _ n—k 
“ford aS a (1 — F(x)) 
— f(x) 3 ——___—___ (F(x) "(1 — F(x)" 
Prt 1! 


n! 
= i-1/4 _ n—-i 
Sy aT - (x)(F(x))'1 (1 = F(x) 


The preceding density is quite intuitive, since in order for Xj) to equal x, i —1 
of the values X1,...,X, must be less than x; m — i of them must be greater 
than x; and one must be equal to x. Now, the probability density that every 
member of a specified set of i — 1 of the X; is less than x, every member of 
another specified set of n — i is greater than x, and the remaining value is equal 
to x is (F(x))*!(1 — F(x))""'f (x). Therefore, since there are m!/[(i — 1)!(n — i)!] 
different partitions of the 7 random variables into the three groups, we obtain 
the preceding density function. a 


2.5.4 Joint Probability Distribution of Functions of Random Variables 


Let X1 and X2 be jointly continuous random variables with joint probability den- 
sity function f (x1, x2). It is sometimes necessary to obtain the joint distribution 
of the random variables Y; and Y> that arise as functions of X1 and X>. Specifi- 
cally, suppose that Y; = g1(X 1, Xz) and Y2 = g2(X1, Xz) for some functions g1 
and go. 

Assume that the functions gy and g) satisfy the following conditions: 


1. The equations yy = g1(x1,x2) and y2 = g9(x1,x2) can be uniquely solved for x1 
and x2 in terms of y; and y2 with solutions given by, say, x1 = /1(y1,y2),x2 = 


ha(y1,92). 
2. The functions gy and g have continuous partial derivatives at all points (x1, x2) 
and are such that the following 2 x 2 determinant 


agra 
0x1 9%2) _ Ogi Ago Agi Ag2 
dg2 dg2 ~ 8x1 8x2 9x2 AX] 


J(x1,%2) = 


at all points (x1, x2). 


60 Random Variables 


Under these two conditions it can be shown that the random variables Y; and Y> 
are jointly continuous with joint density function given by 


fY,,¥9 (15 92) = Fx4,X5 (15 2) |] (0615 x2)| (2.19) 


where x1 = (1 (1, 2))%2 = h2(1, 92). 
A proof of Equation (2.19) would proceed along the following lines: 


PUYs< 915-Yo-5 Y2} = / fx1,X(%1, %2) dx4 dx2 (2.20) 


(x1,%2): 
81(%1,%2) <1 
82(«1,%2)<y2 


The joint density function can now be obtained by differentiating Equation (2.20) 
with respect to y; and y2. That the result of this differentiation will be equal to 
the right-hand side of Equation (2.19) is an exercise in advanced calculus whose 
proof will not be presented in the present text. 


Example 2.39 If X and Y are independent gamma random variables with param- 
eters (a, A) and (f, A), respectively, compute the joint density of U = X + Y and 
V=X/(X+ Y). 


Solution: The joint density of X and Y is given by 


ne **(Ax)*—! re (Ay) PI 
y 


x,y(x,y) = 
fey Gy r@ P(A) 
OTP 
= ety) ga-1 8-1 
I(a@)I(B) 
Now, if g1(x,y) =x + y, g2(x,y) = x/(x + y), then 
981 _ 981 _ , 7 Og2 _ x 
ax dy ‘ Ox (x + y)2’ oy (x + y)2 
and so 
1 1 1 
XxX, =— y —X ——— 
T(x, y) gs 


(x+y)? («+ y)? 


Finally, because the equations u = x + y, v = x/(x + y) have as their solutions 
x= uv, y= u(1 —v), we see that 


fu,v (u,v) = fx,y[uv, u(1 — v)]Ju 
ne *#(Au)etP-1 yo-1(1 — v)P-1 F(a + B) 
Ta + B) Pa@)r(B) 
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Hence X + Y and X/(X + Y) are independent, with X + Y having a 
gamma distribution with parameters (a + £, 4) and X/(X + Y) having density 
function 


T(@ + B) ye-lq 
P(@)P(B) 


This is called the beta density with parameters (qa, B). 

This result is quite interesting. For suppose there are n + m jobs to be 
performed, with each (independently) taking an exponential amount of time 
with rate A for performance, and suppose that we have two workers to perform 
these jobs. Worker I will do jobs 1, 2,..., 7, and worker II will do the remaining 
m jobs. If we let X and Y denote the total working times of workers I and II, 
respectively, then upon using the preceding result it follows that X and Y will 
be independent gamma random variables having parameters (7, 4) and (m, A), 
respectively. Then the preceding result yields that independently of the working 
time needed to complete all x + m jobs (that is, of X + Y), the proportion 
of this work that will be performed by worker I has a beta distribution with 
parameters (7, 7). |_| 


yb} O0<v<il 


fyvv) = 


When the joint density function of the 7 random variables X1, X2,..., Xn is 
given and we want to compute the joint density function of Y;, Y2,..., Yn, where 


Y1 = 91(X%1,..., Xn), Yo = g2(X,...,Xn), es 
Yn = Bn(X1, os .5 Xn) 


the approach is the same. Namely, we assume that the functions g; have contin- 
uous partial derivatives and that the Jacobian determinant J(x1,...,xn) #0 at 
all points («1,...,Xn1), where 


0x1 0x2 OXy 

_ |9g2 Aga dg2 

J(X15 +65 %n) = Sear es es 
98n  98n Bn 

0x1 0x2 OXn 


Furthermore, we suppose that the equations y1=g1(%1,...,Xn),) Y2=22 
(X15. 065 Xn)5 +029 Vn =Ln(X15---5Xn) have a unique solution, say, x1=h1 
(1506-9 Vas +229 Xn =hn(1,---5¥n). Under these assumptions the joint density 
function of the random variables Y; is given by 


Pig eVo V1 0065.00) = Pitty Xp 19-02%) 1,0 Xn) I 


where x; = hj(y1,.--,;Yn),t = 1,2,...,7. 
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2.6 Moment Generating Functions 


The moment generating function $(t) of the random variable X is defined for all 
values t by 


o(t) = Efe] 
yy e p(x), if X is discrete 


lee) 
/ ef (x) dx, if X is continuous 
—0o 


We call d(t) the moment generating function because all of the moments of X 
can be obtained by successively differentiating ¢(t). For example, 


/ = d tX 
¢ (t)= Wee ] 


“Lie 
= E[Xe"*] 
Hence, 
¢' (0) = E[X] 
Similarly, 


¢'(t) = “00 
= © Exe 
—E | xe | 
= E[X7e*] 
and so 
g" (0) = E[X*] 
In general, the nth derivative of @(t) evaluated at t = 0 equals E[X”], that is, 
$" (0) = E[X"], n>=1 


We now compute ¢(t) for some common distributions. 
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Example 2.40 (The Binomial Distribution with Parameters 7 and p) 
p(t) = Ele] 


k=0 
=> (j)@erta —py4 
k=0 


= (pe + 1—p)” 
Hence, 
$'(t) = n(pe! + 1-p)"""pet 
and so 
E[X] = ¢'(0) = mp 


which checks with the result obtained in Example 2.17. Differentiating a second 
time yields 
g(t) = n(n — 1) (pe! + 1 — pp)" * pe’)? + n(pe + 1 — py” ‘pe! 
and so 
E[X?] = $"(0) = n(n — 1)p* + np 
Thus, the variance of X is given by 
Var(X) = E[X*] — (E[X])* 


=n(n— 1)p* + np — np 
= np(1— p) a 


Example 2.41 (The Poisson Distribution with Mean A) 
p(t) = Ele] 


lee) = 
ee Ayn 


n=0 
oO tyn 
aa (Ae*) 
=e Me n! 
n=0 


- t 
=e A phe 


n! 


= exp{A(e’ — 1)} 
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Differentiation yields 

¢' (t) = re’ exp{a(e’ — 1)}, 

ob" (t) = (ae’)* exp{a(e’ — 1)} + Ae’ exp{a(e’ — 1)} 
and so 

E[X] = ¢'(0) =A, 
E[X?] = 9¢"(0) =4° +4, 
Var(X) = E[X*] — (E[X])? 
=% 


Thus, both the mean and the variance of the Poisson equal 4. a 


Example 2.42 (The Exponential Distribution with Parameter 1) 
p(t) = Ele] 
Co 
- / ere dx 
0 


lee) 
= i e OD* de 
0 


=, ae! fort <A 
t 


We note by the preceding derivation that, for the exponential distribution, ¢(t) 
is only defined for values of t less than 4. Differentiation of $(¢) yields 
2x 


/ = a “ = 
VO=Gopp 9% O= Grp 


(A — £)?’ 
Hence, 
/ 1 A 2 
EIX]=$O=>, EX ]=#'O=5 
The variance of X is thus given by 


Var(X) = E[X*] — (E[X])? = * | 


Example 2.43 (The Normal Distribution with Parameters 4 and o7) The moment 
generating function of a standard normal random variable Z is obtained as 
follows. 
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iZ 1 7 —x?/2 
Efe ] = Vin eve dx 
JT J —oo 


—(x2—2tx)/2 ae 


1 lo) 
== e 
ae 


[o,@) 
_gep2 1 / Pag ca) ee 
V2 —oo 


2 
meee 


If Z is a standard normal, then X = oZ + y is normal with parameters and 
o*; therefore, 


omy) 
t 
ot) = Ele™] = Efe"24)] = ef Efe?) = exp {= * ul 
By differentiating we obtain 


ry) 
¢'(t) = (u + to*) exp {= + ul ; 


” _ 2,2 ot? 2 ot 
g@ (t) = (u+ to“) exp a +o exp 7 + at 


and so 


E[(X]=¢'0) =n, 
E[X*] = ¢"(0) = p* + 0% 
implying that 
Var(X) = E[X*] — E((X])* 
= o Hi 
Tables 2.1 and 2.2 give the moment generating function for some common 
distributions. 
An important property of moment generating functions is that the moment 
generating function of the sum of independent random variables is just the prod- 
uct of the individual moment generating functions. To see this, suppose that 


X and Y are independent and have moment generating functions x(t) and 
gy (t), respectively. Then ¢x+y(t), the moment generating function of X + Y, 
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Table 2.1 
Discrete Probability Moment 
probability mass generating 
distribution function, p(x) function, #(t) Mean Variance 
Binomial with (")p*(1 — p)”*, (pe’ + (1—p))” np np(1 — p) 
parameters 1, p, pt OMe cy 
0 = Pp < 1 a) > 
A~ 
Poisson with Scere! exp{A(e’ — 1)} my a 
parameter a 
A>0 x= 0, 1, 2, 
1 1- 
Geometric with pa- py}, a — ae 
parameter ihe A Die 1-(1—p)e P p 
0 < Pp < 1 9 > 
Table 2.2 
Continuous Moment 
probability Probability density generating 
distribution function, f (x) function, g(t) Mean Variance 
ieee tb _ ta 2440 
Uniform f(x) = Reagent ey ab z ee 8) 
over (a, b) 0, otherwise t(b — a) 2 12 
—Ax 
Exponential with f(x) = o ? . . A Z ed 
parameter A > 0 _ ee x n 
re (Axye} n 
Xr 
Gamma with fx)= | (wm—1! ° mal (4) a _ 
At r 2 
parameters 0, x <0 
(n,a),rA > 0 
pore) 
Normal with fa) =—- exp {ut + is be o 
parameters V2x0 
(H,07) x exp{—(x — 14)? /207}, 
-0 <x <0 
is given by 


ox+y(t) = Ele’ Xt 


~ 


= Efe* et”) 
= Efe’* ]E[e’™] 
= x (toy (d) 


where the next to the last equality follows from Proposition 2.3 since X and Y 


are independent. 
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Another important result is that the moment generating function uniquely 
determines the distribution. That is, there exists a one-to-one correspondence 
between the moment generating function and the distribution function of a ran- 
dom variable. 


Example 2.44 (Sums of Independent Binomial Random Variables) If X and Y 
are independent binomial random variables with parameters (,p) and (m, p), 
respectively, then what is the distribution of X + Y? 


Solution: The moment generating function of X + Y is given by 


éx+y(t) = ox (Ody (t) = (pe’ + 1— p)"(pe’ + 1- p)” 
= (pel +1-—pyr™ 
But (pe’ + (1 — p))"*” is just the moment generating function of a binomial 


random variable having parameters m + n and p. Thus, this must be the 
distribution of X + Y. a 


Example 2.45 (Sums of Independent Poisson Random Variables) Calculate the 
distribution of X + Y when X and Y are independent Poisson random variables 
with means A, and Ag, respectively. 


Solution: 


bx+y (t) = ox) by () 


= ehile’—1) r2(e'—1) 


= pltitaay(e-1) 


Hence, X + Y is Poisson distributed with mean A, + Az, verifying the result 
given in Example 2.37. a 


Example 2.46 (Sums of Independent Normal Random Variables) Show that if X 

and Y are independent normal random variables with parameters (111, 07) and 

(12, 05), respectively, then X + Y is normal with mean wy + p2 and variance 
2 2 

oF + 05. 


Solution: 


ox+y(t) = ox Oey (t) 


ort? Cots 
= exp 7 + it ¢ exp a + pat 


2) 242 
oo; + 05)t 


+ (ua + won| 
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which is the moment generating function of a normal random variable with 
mean 4 + 2 and variance Oo; + Coe Hence, the result follows since the 
moment generating function uniquely determines the distribution. a 


Example 2.47 (The Poisson Paradigm) We showed in Section 2.2.4 that the num- 
ber of successes that occur in 7 independent trials, each of which results in a suc- 
cess with probability p is, when n is large and p small, approximately a Poisson 
random variable with parameter 4 = np. This result, however, can be substan- 
tially strengthened. First it is not necessary that the trials have the same success 
probability, only that all the success probabilities are small. To see that this is 
the case, suppose that the trials are independent, with trial i resulting in a success 
with probability p;, where all the p;, i= 1,...,7 are small. Letting X; equal 1 if 
trial 7 is a success, and 0 otherwise, it follows that the number of successes, call 
it X, can be expressed as 


X= ye 
i=1 


Using that X; is a Bernoulli (or binary) random variable, its moment generating 
function is 


Efe] = pie’ + 1— pi = 1+ pile’ - 1) 
Now, using the result that, for |x| small, 
erxit+x 
it follows, because p;(e’ — 1) is small when p; is small, that 
Ele] = 1 + pie’ — 1) © exp{pi(e’ — D} 
Because the moment generating function of a sum of independent random vari- 


ables is the product of their moment generating functions, the preceding implies 
that 


E[e’*] ~) | [exptoite’ —1)}= exp| pute _ »| 


i=1 


But the right side of the preceding is the moment generating function of a Poisson 
random variable with mean )°; p;, thus arguing that this is approximately the 
distribution of X. 

Not only is it not necessary for the trials to have the same success probability for 
the number of successes to approximately have a Poisson distribution, they need 
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not even be independent, provided that their dependence is weak. For instance, 
recall the matching problem (Example 2.31) where 7 people randomly select hats 
from a set consisting of one hat from each person. By regarding the random 
selections of hats as constituting 7 trials, where we say that trial i is a success if 
person i chooses his or her own hat, it follows that, with A; being the event that 
trial i is a success, 


1 1 see 
P(Aj) = — and P(A\IA)) = — [Fl 


Hence, whereas the trials are not independent, their dependence appears, for 
large n, to be weak. Because of this weak dependence, and the small trial success 
probabilities, it would seem that the number of matches should approximately 
have a Poisson distribution with mean 1 when 7 is large, and this is shown to be 
the case in Example 3.23. 

The statement that “the number of successes in 7 trials that are either inde- 
pendent or at most weakly dependent is, when the trial success probabilities are 
all small, approximately a Poisson random variable” is known as the Poisson 
paradigm. a 


Remark For a nonnegative random variable X, it is often convenient to define 
its Laplace transform g(t), t => 0, by 


g(t) = o(-t) = Ele] 


That is, the Laplace transform evaluated at t is just the moment generating func- 
tion evaluated at —t. The advantage of dealing with the Laplace transform, rather 
than the moment generating function, when the random variable is nonnegative 
is that if X > 0 and ¢t > 0, then 
O<e*% <1 

That is, the Laplace transform is always between 0 and 1. As in the case of 
moment generating functions, it remains true that nonnegative random variables 
that have the same Laplace transform must also have the same distribution. I 


It is also possible to define the joint moment generating function of two or 
more random variables. This is done as follows. For any 1 random variables 
X1,...,Xn, the joint moment generating function, $(f1,...,t,), is defined for all 
real values of f1,...,t by 


o(t, ey tn) — E[e*1 tot inXn)) 


It can be shown that (t1,...,t,) uniquely determines the joint distribution of 
Miseacanen 
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Example 2.48 (The Multivariate Normal Distribution) Let Z;,...,Z,, be a set of 
n independent standard normal random variables. If, for some constants aj, 1 < 
i<m,1<j<n,andpj,1<i<m, 


Xy = ayjZy +--+ + anZn + M1, 
X2 = 471Z1 +--+ + danZy + b2, 


Xj, = ai1Z1 + +++ + dinZn + Mis 


Xm = Ami Z1 + +++ + AmnZn + Um 
then the random variables X),...,Xj, are said to have a multivariate normal 
distribution. 
It follows from the fact that the sum of independent normal random variables 


is itself a normal random variable that each X; is a normal random variable with 
mean and variance given by 


E[Xj] = wi, 


Var(Xj) = ye ai; 
j=l 


Let us now determine 
o(t, or) tm) = Elexp{t1X1 oo I tnmXm}] 
the joint moment generating function of X1,..., Xj. The first thing to note is that 
since )*/" ,t;X; is itself a linear combination of the independent normal random 


variables Z1,..., Zn, it is also normally distributed. Its mean and variance are 
respectively 


m m 
E pp x] = >a tii 
71 jz 


and 
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Now, if Y is a normal random variable with mean pu and variance o”, then 
Ele’] = gy@hiai = ett? 
Thus, we see that 
m tm 
o(ty,..-,tm) = exp > tii + 5 ys PS tit; Cov(Xj, Xj) 
i=1 i=1 j=1 


which shows that the joint distribution of X1,..., Xm is completely determined 
from a knowledge of the values of ELX;] and Cov(X;, Xj), i,j =1,...,m. | 


2.6.1 The Joint Distribution of the Sample Mean and Sample Variance 
from a Normal Population 


Let X1,..., X» be independent and identically distributed random variables, each 
with mean yw and variance o*. The random variable S* defined by 


2 wo (Xj — XY 

a a n—-1 
j=1. 

is called the sample variance of these data. To compute E[S7] we use the identity 
DEK — XY? = DIG — wy)? — 0X — g)? (2.21) 
i=1 i=1 

which is proven as follows: 

Yi -— X) = (Ki -— wt pe — XY? 

i=1 


i=1 
= DUK = w)* + nu — X)? + uw — X) PIG — w) 
i=1 i=1 
= DUG = Wy? + nw — XY? + 2 — X) (AX — np) 
i=1 


= DUK — wy? + nu — X)? = 2n(u — X)? 
i=1 


and Identity (2.21) follows. 
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Using Identity (2.21) gives 


E{(n ~ 1)S?] = D F108 py] — nE((X ~ 1") 


=no*—n Var(X) 
= (n— 1)o7 from Proposition 2.4(b) 


Thus, we obtain from the preceding that 
E[S?] = o” 


We will now determine the joint distribution of the sample mean X = 
>, Xi/n and the sample variance $* when the X; have a normal distribution. 
To begin we need the concept of a chi-squared random variable. 


Definition 2.2 If Z,,...,Z,, are independent standard normal random variables, 
then the random variable )7_, Z? is said to be a chi-squared random variable 
with 1 degrees of freedom. 

We shall now compute the moment generating function of )~_, Z?. To begin, 
note that 


2: 
Efexp{tZ?}] = =| _¢ e* 2 dx 
a 
2 2 
= — e* 120° dx where o7 = (1 — 2t)7! 
JV 250 an 
=O 
= (1-22-17 
Hence, 


E foe Dacal =| | Elexp{tZ7}] = (1 -— 208)-"/? 
i=1 


i=1 


Now, let Xj,.. Xn be independent normal random variables, each with mean 
wand variance o7, and let X = )7_, X;/n and S* denote their sample mean and 
sample variance. Sings the sum of independent normal random variables is also 
a normal random variable, it follows that X is a normal random variable with 
expected value yw and variance o*/n. In addition, from Proposition 2.4, 


Cov(X, X; — X) = 0, i=1,...,n (2.22) 
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Also, since X,X 1 — X,X2 — X,...,X, — X are all linear combinations of 
the independent standard normal random variables (Xj; — 4)/o,i = 1,...,7, 
it follows that the random variables X,X, — X,X2 — X,...,X, — X have a 
joint distribution that is multivariate normal. However, if we let Y be a nor- 
mal random variable with mean yw and variance o7/n that is independent of 
X1,...,Xn, then the random variables Y, X; — X, X2 — X,..., X, — X also have 
a multivariate normal distribution, and by Equation (2.22), they have the same 
expected values and covariances as the random variables X, X;— X, i=1,...,7. 
Thus, since a multivariate normal distribution is completely determined by its 
expected values and covariances, we can conclude that the random vectors 
Y,X1—- pe. Cee Om Xn = and 3G ao XX Gs ,X, — X have the 
same joint diseeibutions “this oe that X is fdependent Be the sequence of 
deviations X; — X,i=1,. 

Since X is iadepenuent of ae sequence of deviations X; — X,i = 1,...,1, it 
follows that it is also independent of the sample variance 


i -_ X¥)2 
gay Xie® 


é n—1 
i=1 


To determine the distribution of S*, use Identity (2.21) to obtain 
(n— 1)S* = Dex py? — n(X — gw)? 


Dividing both sides of this equation by o7 yields 


= 2 
(a@—DS* | (X-H\) _W&i-w 
a (TA) ole a 


Now, 37_,(X; — w)?/o? is the sum of the squares of independent stan- 
dard normal random variables, and so is a chi-squared random variable with 
n degrees of Hoon, it thus has moment generating function (1 — 2t)~”/*. Also 

[(X — 1)/(o/./n)/* is the square of a standard normal random variable and so is 
a chi-squared random variable with one degree of freedom; it thus has moment 
generating function (1 — 2t)~!/?. In addition, we have previously seen that the 
two random variables on the left side of Equation (2.23) are independent. There- 
fore, because the moment generating function of the sum of independent random 
variables is equal to the product of their individual moment generating functions, 
we obtain that 


Elet-)9°/07 44 _ aj = ‘al _ oh a ee 
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or 


Elet- D8 /07} Ge 2t)-@-D/2 


But because (1 — 2t)~~)/? is the moment generating function of a chi-squared 
random variable with m — 1 degrees of freedom, we can conclude, since the 
moment generating function uniquely determines the distribution of the random 
variable, that this is the distribution of (n — 1)S?/o?. 

Summing up, we have shown the following. 


Proposition 2.5 If X1,...,X, are independent and identically distributed nor- 
mal random variables with mean w and variance o~, then the sample mean X 
and the sample variance S* are independent. X is a normal random variable with 
mean pw and variance o7/n; (n — 1)S*/o? is a chi-squared random variable with 
n — 1 degrees of freedom. 


2.7 The Distribution of the Number of Events that Occur 


Consider arbitrary events Aj,...,A,, and let X denote the number of these events 
that occur. We will determine the probability mass function of X. To begin, for 
1<k<vn, let 


Se EP Ag Ay) 


11 <...<ip 


equal the sum of the probabilities of all the ({) intersections of k distinct events, 
and note that the inclusion-exclusion identity states that 


P(X > 0) = P(U_, Ad = S1 — Sz +83 —--» + (-1)"*' 8, 
Now, fix k of the 7 events — say Aj,,...,A;, — and let 
A=Mh,Aj 
be the event that all k of these events occur. Also, let 
B= Ne 


ig} AF 


11 5-++5) 


be the event that none of the other 2 — k events occur. Consequently, AB is the 
event that A;,,...,Aj;, are the only events to occur. Because 


A= ABU ABS 
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we have 

P(A) = P(AB) + P(AB®) 
or, equivalently, 

P(AB) = P(A) — P(AB‘) 


Because B° occurs if at least one of the events Aj,j ¢ {i1,...,ig}, occur, we 
see that 


Bo = Vigtin,...ig) Aj 
Thus, 
P(ABY) = P(A Ujgtiz,..ig) Aj) = PUjeti,..ig AAD 
Applying the inclusion-exclusion identity gives 
P(AB)= > P(AA)— YS) P(AA;,Ajy) 
JELi1,--stp} J1<J2E{U15.-5tp} 


+ ye P(AAj, Aj, Aj) _ 


11</2</3Hi,--stk} 


Using that A = NE Ais the preceding shows that the probability that the k events 
Aj,,...,Aj, are the only events to occur is 


P(A) — P(AB*) = P(Aj, ...Ay)—- D  P(Ai.-- Ai, Ap) 
i#{i1,.. slp} 
+ ~ P(Aj, ... Aj, Aj, Ajz) 
J1<J2Ei15.--stR} 


= ‘9 P(Aj, ... Aj, Aj, Ap Aj) + °° 


I1<J2</3 AM 55th} 
Summing the preceding over all sets of k distinct indices yields 


PSh) = SS PAR A= oY. PG AgAD 


i, <...<ig i, <..-<ig j{i1,.-ie)} 


+> So P(Ai + Ag Aj, Aj) — (2.24) 


ty << J1<JrE{l15.--she} 
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First, note that 


ys P(Aj, ... Ai.) = Sp 


11 <...<lp 


Now, consider 


MO PA AGA) 


11 <..-<ip JE{i15---stgh 


The probability of every intersection of k + 1 distinct events Ajm,,...,Am,,, will 
appear ‘on times in this multiple summation. This is so because each choice of 
k of its indices to play the role of i1,...,i, and the other to play the role of j 


results in the addition of the term P(Aj, ... Am, 1): Hence, 


SS dg) ) Nee POmeeiley a) 


11 <...<tp fE{i15.--5ip} My <...<Mg44 
k+1 
= ( k ) Sk44 
Similarly, because the probability of every intersection of k+2 distinct events 
Amsisis A will appear eee, times in >> a4 PA st 


92 MR+2 
Aj, Aj), it follows that 


k+2 
De yy P(Aj, .-. Ai, Aj, Aj.) = ( k ) Sera 


I<. <ip f1<JrE{l15.--stp} 


if <...<ig Da eeiietines ik 


Repeating this argument for the rest of the multiple summations in (2.24) yields 
the result 


k+1 k+2 
PX =) = Se ( i ) Sess + ( Sea FD (2) Sy 


The preceding can be written as 
POX =k) = Ci (;)s 
j=k 


Using this we will now prove that 


n (j—1 
PK ==) ¢ - 1S 
j=k 
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The proof uses a backwards mathematical induction that starts with k = n. Now, 


when k = n the preceding identity states that 
P(X =n)= 
which is true. So assume that 
PX>k+1)= > itt ( : )s 
j=k+1 


But then 
P(X>k) = PX =k) + PK = k+) 


2G asl )s fs ore etal! 1S 


j=kt+1 


Pe 
eS catri(?) - (' : )is 


ce 


= S, + ie Dei 1)S 


j=kt+l1 


-Syovm(o)s 


which completes He proof. 
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We start this section by proving a result known as Markov’s inequality. 


Proposition 2.6 (Markov’s Inequality) If X is a random variable that takes only 


nonnegative values, then for any value a > 0 


PIX > a) < A) 


Proof. We give a proof for the case where X is continuous with density f. 


E[X] = Ts xf (x) dx 
0 


= [store + [fora 
0 a 
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> [tou 
> [ atenax 


=af fenrdx 
= aP{X > a} 


and the result is proven. 
As a corollary, we obtain the following. 


Proposition 2.7 (Chebyshev’s Inequality) If X is a random variable with mean 
wand variance o%, then, for any value k > 0, 


2 
P{IX — p| > k} < = 


Proof. Since (X — 1) is a nonnegative random variable, we can apply Markov’s 
inequality (with a = k?) to obtain 


Lid 
PUX =p)? = Ry < HAS 
RB 
But since (X — w)* > k? if and only if |X — y| > k, the preceding is equi- 
valent to 
E((X—)*] _ 0? 


P(X — wl =k} s 5 = 


and the proof is complete. 


The importance of Markov’s and Chebyshev’s inequalities is that they enable 
us to derive bounds on probabilities when only the mean, or both the mean 
and the variance, of the probability distribution are known. Of course, if the 
actual distribution were known, then the desired probabilities could be exactly 
computed, and we would not need to resort to bounds. 


Example 2.49 Suppose we know that the number of items produced in a factory 
during a week is a random variable with mean 500. 


(a) What can be said about the probability that this week’s production will be at least 
1000? 

(b) Ifthe variance of a week’s production is known to equal 100, then what can be said 
about the probability that this week’s production will be between 400 and 600? 
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Solution: Let X be the number of items that will be produced in a week. 


(a) By Markov’s inequality, 


E[X] 500 1 
P{X = 1000} = 7900 = To00 > 2 


(b) By Chebyshev’s inequality, 


o 1 
P{|X — 500| > 100} < (002 = i00 
Hence, 
1 
P{|X — 500| < 100} = 1— = = a 
and so the probability that this week’s production will be between 400 and 
600 is at least 0.99. a 


The following theorem, known as the strong law of large numbers, is probably 
the most well-known result in probability theory. It states that the average of 
a sequence of independent random variables having the same distribution will, 
with probability 1, converge to the mean of that distribution. 


Theorem 2.1 (Strong Law of Large Numbers) Let Xj, X2,... be a sequence of 
independent random variables having a common distribution, and let E[X;] = p. 
Then, with probability 1, 


Xi +X. +--+ + Xn 
nN 


> Uh as 1 —> OO 


As an example of the preceding, suppose that a sequence of independent trials 
is performed. Let E be a fixed event and denote by P(E) the probability that E 
occurs on any particular trial. Letting 


x, 1h if E occurs on the ith trial 
‘10, if E does not occur on the ith trial 


we have by the strong law of large numbers that, with probability 1, 


ee oe 
nN 


E[X] = P(E) (2.25) 


Since X; + --- + X, represents the number of times that the event E occurs in the 
first 1 trials, we may interpret Equation (2.25) as stating that, with probability 
1, the limiting proportion of time that the event E occurs is just P(E). 
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Running neck and neck with the strong law of large numbers for the honor 
of being probability theory’s number one result is the central limit theorem. 
Besides its theoretical interest and importance, this theorem provides a simple 
method for computing approximate probabilities for sums of independent ran- 
dom variables. It also explains the remarkable fact that the empirical frequen- 
cies of so many natural “populations” exhibit a bell-shaped (that is, normal) 
curve. 

Theorem 2.2 (Central Limit Theorem) Let Xj, X2,... be a sequence of indepen- 
dent, identically distributed random variables, each with mean and variance o?. 
Then the distribution of 


Xy + X24+--- + Xy—-— nw 
a/n 


tends to the standard normal as 2 > oo. That is, 


p{=e eee cal + =|. ae 
a/n J/2 


asn > ©. 


Note that like the other results of this section, this theorem holds for any 
distribution of the Xjs; herein lies its power. 

If X is binomially distributed with parameters 1 and p, then X has the same 
distribution as the sum of 7 independent Bernoulli random variables, each with 
parameter p. (Recall that the Bernoulli random variable is just a binomial random 
variable whose parameter 1 equals 1.) Hence, the distribution of 


X—E[X]__ X-—np 
JVar(X)  /np( — p) 


approaches the standard normal distribution as 1 approaches oo. The nor- 
mal approximation will, in general, be quite good for values of m satisfying 
np(1 — p) = 10. 


Example 2.50 (Normal Approximation to the Binomial) Let X be the number 
of times that a fair coin, flipped 40 times, lands heads. Find the probability 
that X = 20. Use the normal approximation and then compare it to the exact 
solution. 


Solution: Since the binomial is a discrete random variable, and the normal 
a continuous random variable, it leads to a better approximation to write the 
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desired probability as 
PX — 90 = P19 5 = Ke 2505) 
{= —-20 X-—20 | 
=P < < 
V10 V10 V10 


=p {-0.16 Pe ae 0.16| 
10 


= (0.16) — &(—0.16) 


where ®(x), the probability that the standard normal is less than x is given by 


P(x) = eV /2 dy 


1 x 
V20 s 
By the symmetry of the standard normal distribution 


(—0.16) = P{N(O, 1) > 0.16} = 1 — (0.16) 


where N(0, 1) is a standard normal random variable. Hence, the desired prob- 
ability is approximated by 


P(X = 20} + 20(0.16) —1 
Using Table 2.3, we obtain 
P(X = 20} ~ 0.1272 


The exact result is 


A0\: (Ty 
rx=a= (8) (8) 


which can be shown to equal 0.1268. a 


Example 2.51 Let X;,i = 1,2,...,10 be independent random variables, each 
being uniformly distributed over (0, 1). Estimate Pty i? X; > 7}. 


Solution: Since ELX;] = 7 Var(X;) = vu we have by the central limit theorem 
that 


p{yoxi7} =r wXi=5 | (eae 
[10 (44) [10 (14) 
= 1— (2.2) 
= 0.0139 | 
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Table 2.3. Area ®(x) under the Standard Normal Curve to the Left of x 


x 0.00 0.01 0.002 0.03 0.04 0.05 0.06 0.07 0.08 0.09 


0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5597 0.5636 0.5675 0.5714 0.5753 
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 


0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 


1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8557 0.8599 0.8621 
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 


1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 


2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 


2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 


3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 


Example 2.52 The lifetime of a special type of battery is a random variable with 
mean 40 hours and standard deviation 20 hours. A battery is used until it fails, 
at which point it is replaced by a new one. Assuming a stockpile of 25 such 
batteries, the lifetimes of which are independent, approximate the probability 
that over 1100 hours of use can be obtained. 
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Solution: If we let X; denote the lifetime of the ith battery to be put in use, 
then we desire p = P{X; + --- + X25 > 1100}, which is approximated as 
follows: 


poP{2 a | 


20/25 BOIS 
~ P{N(0,1) > 1} 
=1- (1) 
~ 0.1587 = 


We now present a heuristic proof of the central limit theorem. Suppose first that 
the X; have mean 0 and variance 1, and let E[e’*] denote their common moment 


enerating function. Then, the moment generating function of *!+7+*« jg 
g g > g g Vn 


P [ex (: (* 1 “VM = BletXi/ViigtX2/ Vi... gtXn/-Vn] 
= (Ele! a by independence 


Now, for large, we obtain from the Taylor series expansion of e” that 


tX  t*X? 
—+ 
J/n 2n 


Taking expectations shows that when 7 is large 


eXldA xs 1 + 


2: 2 
E[e!X/¥"] w14 tE[X] 4 tE[X*] 


Jn Qn 
t2 

=1+ - because E[X] = 0, E[X?] = 1 
nN 


Therefore, we obtain that when 7 is large 


Efe )I]-(+ 5) 


When 1 goes to co the approximation can be shown to become exact and we 
have 


Xit-+Xe\)]_ 2p 
selon (™™)]- 
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Thus, the moment generating function of *1+*-+** converges to the moment 


generating function of a (standard) normal random variable with mean 0 and 
variance 1. Using this, it can be proven that the distribution function of the 
random variable at converges to the standard normal distribution func- 
tion ®. 

When the X; have mean yp and variance o7, the random variables ir have 
mean 0 and variance 1. Thus, the preceding shows that 


oe ae pe the ines 
P| top tx pte + X H sal Gy 
o/n 


which proves the central limit theorem. 


2.9 Stochastic Processes 


A stochastic process {X(t),t € T} is a collection of random variables. That is, 
for each t € T, X(t) is a random variable. The index t¢ is often interpreted as 
time and, as a result, we refer to X(t) as the state of the process at time ft. For 
example, X(t) might equal the total number of customers that have entered a 
supermarket by time t; or the number of customers in the supermarket at time 
t; or the total amount of sales that have been recorded in the market by time ¢; 
etc. 

The set T is called the index set of the process. When T is a countable set 
the stochastic process is said to be a discrete-time process. If T is an interval of 
the real line, the stochastic process is said to be a continuous-time process. For 
instance, {X,,2 = 0,1,...} is a discrete-time stochastic process indexed by the 
nonnegative integers; while {X(t), t > O} is a continuous-time stochastic process 
indexed by the nonnegative real numbers. 

The state space of a stochastic process is defined as the set of all possible values 
that the random variables X(t) can assume. 

Thus, a stochastic process is a family of random variables that describes the 
evolution through time of some (physical) process. We shall see much of stochastic 
processes in the following chapters of this text. 


Example 2.53 Consider a particle that moves along a set of m + 1 nodes, labeled 
0,1,...,, that are arranged around a circle (see Figure 2.3). At each step the 
particle is equally likely to move one position in either the clockwise or counter- 
clockwise direction. That is, if X,, is the position of the particle after its mth step 
then 


P{Xng1 = it 1Xy =i} = P{Xpy1 =i - 1X, =i} = 5 


2.9 Stochastic Processes 85 


Figure 2.3. Particle moving around a circle. 


where i + 1 = 0 when i = m, and i— 1 = m when i = O. Suppose now that 
the particle starts at 0 and continues to move around according to the preceding 
rules until all the nodes 1, 2,...,72 have been visited. What is the probability that 
node i,i = 1,...,, is the last one visited? 


Solution: Surprisingly enough, the probability that node i is the last node 
visited can be determined without any computations. To do so, consider the 
first time that the particle is at one of the two neighbors of node /, that is, the 
first time that the particle is at one of the nodes i—1 ori + 1 (withm +1=0). 
Suppose it is at node i— 1 (the argument in the alternative situation is identical). 
Since neither node i nor i + 1 has yet been visited, it follows that 7 will be the 
last node visited if and only if i + 1 is visited before i. This is so because in 
order to visit i + 1 before i the particle will have to visit all the nodes on the 
counterclockwise path from i— 1 toi + 1 before it visits i. But the probability 
that a particle at node i—1 will visit i + 1 before i is just the probability that a 
particle will progress 7—1 steps in a specified direction before progressing one 
step in the other direction. That is, it is equal to the probability that a gambler 
who starts with one unit, and wins one when a fair coin turns up heads and 
loses one when it turns up tails, will have his fortune go up by m— 1 before he 
goes broke. Hence, because the preceding implies that the probability that node 
i is the last node visited is the same for all i, and because these probabilities 
must sum to 1, we obtain 


P{i is the last node visited} = 1/m, i=1,....m | 


Remark The argument used in Example 2.53 also shows that a gambler who is 
equally likely to either win or lose one unit on each gamble will be down x before 
being up 1 with probability 1/(m + 1); or equivalently, 


n 


P{gambler is up 1 before being down n} = 
n+1 
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Suppose now we want the probability that the gambler is up 2 before being 
down n. Upon conditioning on whether he reaches up 1 before down 1, we 
obtain that 


P{gambler is up 2 before being down n} 
n 


n+1 


= P{up 2 before down n|up 1 before down n} 


P{up 1 before down n + —— 
n+1 


n+1on _ on 
nt+2n+1 n+2 


Repeating this argument yields that 


P{gambler is up k before being down n} = . 
n+k 
Exercises 
1. An urn contains five red, three orange, and two blue balls. Two balls are randomly 


selected. What is the sample space of this experiment? Let X represent the number 
of orange balls selected. What are the possible values of X? Calculate P{X = 0}. 


Let X represent the difference between the number of heads and the number of tails 
obtained when a coin is tossed 7 times. What are the possible values of X? 


In Exercise 2, if the coin is assumed fair, then, for 7 = 2, what are the probabilities 
associated with the values that X can take on? 


Suppose a die is rolled twice. What are the possible values that the following random 
variables can take on? 

(a) The maximum value to appear in the two rolls. 

(b) The minimum value to appear in the two rolls. 

(c) The sum of the two rolls. 

(d) The value of the first roll minus the value of the second roll. 


If the die in Exercise 4 is assumed fair, calculate the probabilities associated with 
the random variables in (i)—(iv). 


Suppose five fair coins are tossed. Let E be the event that all coins land heads. Define 
the random variable Ig 


be 1, if E occurs 
E~ 10, if E° occurs 


For what outcomes in the original sample space does Iz equal 1? What is P{Iz = 1}? 
Suppose a coin having probability 0.7 of coming up heads is tossed three times. 
Let X denote the number of heads that appear in the three tosses. Determine the 
probability mass function of X. 
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8. 


2: 


10. 
*11, 


12. 


13. 


14. 


15; 


*16. 


Suppose the distribution function of X is given by 


0, b<0 
Fb)=45, O<b<1 
a 1<b<o 


What is the probability mass function of X? 
If the distribution function of F is given by 


b<0 
0<b<1 


1<b<2 
2<b<3 
3<b<3.5 
b> 3.5 


’ 


wy 


’ 


F(b) = 


UR Uw NR © 


¥ 


se 


¥ 


eS 


calculate the probability mass function of X. 
Suppose three fair dice are rolled. What is the probability at most one six appears? 


A ball is drawn from an urn containing three white and three black balls. After the 
ball is drawn, it is then replaced and another ball is drawn. This goes on indefinitely. 
What is the probability that of the first four balls drawn, exactly two are white? 


Ona multiple-choice exam with three possible answers for each of the five questions, 
what is the probability that a student would get four or more correct answers just 
by guessing? 

An individual claims to have extrasensory perception (ESP). As a test, a fair coin is 
flipped ten times, and he is asked to predict in advance the outcome. Our individual 
gets seven out of ten correct. What is the probability he would have done at least 
this well if he had no ESP? (Explain why the relevant probability is P{X > 7} and 
not P{X = 7}.) 

Suppose X has a binomial distribution with parameters 6 and 5 Show that X = 3 
is the most likely outcome. 


Let X be binomially distributed with parameters 1 and p. Show that as k goes from 

0 to m, P(X = k) increases monotonically, then decreases monotonically reaching 

its largest value 

(a) inthe case that (7 + 1)p is an integer, when k equals either (x + 1)p — 1 or 
(n+ 1)p, 

(b) in the case that (7 + 1)p is not an integer, when k satisfies (7 + 1)p—1 <k < 
(n+ 1)p. 

Hint: Consider P{X = k}/P{X = k — 1} and see for what values of k it is greater 

or less than 1. 


An airline knows that 5 percent of the people making reservations on a certain 
flight will not show up. Consequently, their policy is to sell 52 tickets for a flight 
that can hold only 50 passengers. What is the probability that there will be a seat 
available for every passenger who shows up? 
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17. 


18. 
19. 


20. 


Zils 


22. 


*23. 


24. 


Suppose that an experiment can result in one of r possible outcomes, the ith outcome 
having probability pj, i = 1,...,7r, ))_1 pi = 1. If 2 of these experiments are 
performed, and if the outcome of any one of the 7 does not affect the outcome of 
the other 7 — 1 experiments, then show that the probability that the first outcome 
appears x1 times, the second x2 times, and the rth x, times is 


n! 
—___—_—__p'p3’ --- vf when x1 + x2 +++: +x,=n 
x41x2!... x,! 
This is known as the multinomial distribution. 
Show that when r = 2 the multinomial reduces to the binomial. 


In Exercise 17, let X; denote the number of times the ith outcome appears, i = 
1,...,7. What is the probability mass function of X; + X2 +... + Xp? 


A television store owner figures that 50 percent of the customers entering his store 
will purchase an ordinary television set, 20 percent will purchase a color television 
set, and 30 percent will just be browsing. If five customers enter his store on a 
certain day, what is the probability that two customers purchase color sets, one 
customer purchases an ordinary set, and two customers purchase nothing? 


In Exercise 20, what is the probability that our store owner sells three or more 
televisions on that day? 


If a fair coin is successively flipped, find the probability that a head first appears on 
the fifth trial. 


A coin having probability p of coming up heads is successively flipped until the rth 
head appears. Argue that X, the number of flips required, will be 2, 2 > r, with 
probability 


P{X =n} = ("1 )ea-p™, n>r 


This is known as the negative binomial distribution. 
Hint: How many successes must there be in the first 2 — 1 trials? 


The probability mass function of X is given by 


k-1 
p(k) = (ea )r'a-py, k=0,1,... 


Give a possible interpretation of the random variable X. 
Hint: See Exercise 23. 


In Exercises 25 and 26, suppose that two teams are playing a series of games, 
each of which is independently won by team A with probability p and by team B 
with probability 1 — p. The winner of the series is the first team to win i games. 


25. 


26. 


If i = 4, find the probability that a total of 7 games are played. Also show that this 
probability is maximized when p = 1/2. 

Find the expected number of games that are played when 

(a) 1=2; 

(b) i=3. 


In both cases, show that this number is maximized when p = 1/2. 
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a 27. 


28. 


29. 


30. 


31. 


32. 


33; 


34. 


A fair coin is independently flipped 1 times, k times by A and n — k times by B. 
Show that the probability that A and B flip the same number of heads is equal to 
the probability that there are a total of k heads. 


Suppose that we want to generate a random variable X that is equally likely to 
be either 0 or 1, and that all we have at our disposal is a biased coin that, when 
flipped, lands on heads with some (unknown) probability p. Consider the following 
procedure: 


1. Flip the coin, and let 01, either heads or tails, be the result. 
2. Flip the coin again, and let 02 be the result. 

3. If 04 and 02 are the same, return to step 1. 

4. If 0 is heads, set X = 0, otherwise set X = 1. 


(a) Show that the random variable X generated by this procedure is equally likely 
to be either 0 or 1. 

(b) Could we use a simpler procedure that continues to flip the coin until the last 
two flips are different, and then sets X = 0 if the final flip is a head, and sets 
X = 1ifit is a tail? 

Consider 7 independent flips of a coin having probability p of landing heads. Say 

a changeover occurs whenever an outcome differs from the one preceding it. For 

instance, if the results of the flips are H H T H T H HT, then there are a total of 

five changeovers. If p = 1/2, what is the probability there are k changeovers? 

Let X be a Poisson random variable with parameter 4. Show that P{X = i} increases 

monotonically and then decreases monotonically as i increases, reaching its maxi- 

mum when j is the largest integer not exceeding A. 

Hint: Consider P{X = i}/P{X =i-— 1}. 

Compare the Poisson approximation with the correct binomial probability for the 

following cases: 

(a) P{X =2} whenn = 8, p=0.1. 

(b) P{X = 9} when n = 10, p = 0.95. 

(c) P{X = 0} when x = 10, p=0.1. 

(d) P{X =4} whenn=9, p= 0.2. 

If you buy a lottery ticket in 50 lotteries, in each of which your chance of winning 

a prize is a6 what is the (approximate) probability that you will win a prize (a) 

at least once, (b) exactly once, (c) at least twice? 


Let X be a random variable with probability density 


_ fed—-x7), -1l<x<1 
fe) = fe otherwise 


(a) What is the value of c? 
(b) What is the cumulative distribution function of X? 


Let the probability density of X be given by 


c(4x —2x*), O<x<2 
0, otherwise 


fo =| 
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(a) What is the value of c? 
(bs) P{5<x <3} =? 
35. The density of X is given by 
_ f10/x?, for x > 10 
foo = {4 for x < 10 
What is the distribution of X? Find P{X > 20}. 
36. A point is uniformly distributed within the disk of radius 1. That is, its density is 
f@y=C, O<x*+y'<1 
Find the probability that its distance from the origin is less than x, 0 < x < 1. 

37. Let X1, X2,...,X, be independent random variables, each having a uniform distri- 
bution over (0,1). Let M = maximum (Xj, X2,..., Xj). Show that the distribution 
function of M, Fy(.), is given by 

Fy(x) = x”, O<x<1 
What is the probability density function of M? 
*38. Ifthe density function of X equals 
—2x 
cen, 0<x<c 
(= fs x<0 
find c. What is P{X > 2}? 

39. The random variable X has the following probability mass function: 

pD=5, p2=4, p24=% 
Calculate E[X]. 

40. Suppose that two teams are playing a series of games, each of which is independently 
won by team A with probability p and by team B with probability 1—p. The winner 
of the series is the first team to win four games. Find the expected number of games 
that are played, and evaluate this quantity when p = 1/2. 

41. Consider the case of arbitrary p in Exercise 29. Compute the expected number of 
changeovers. 

42. Suppose that each coupon obtained is, independent of what has been previously 


obtained, equally likely to be any of m different types. Find the expected number 
of coupons one needs to obtain in order to have at least one of each type. 


Hint: Let X be the number needed. It is useful to represent X by 


where each X; is a geometric random variable. 
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43. 


44, 


4S. 


46. 


*47, 


An urn contains 1 + m balls, of which n are red and m are black. They are with- 
drawn from the urn, one at a time and without replacement. Let X be the number 
of red balls removed before the first black ball is chosen. We are interested in deter- 
mining E[X]. To obtain this quantity, number the red balls from 1 to 7. Now define 
the random variables X;, i= 1,...,7”, by 


y= 1, if red ball 7 is taken before any black ball is chosen 
‘10, otherwise 


(a) Express X in terms of the X;. 

(b) Find E[X]. 

In Exercise 43, let Y denote the number of red balls chosen after the first but before 
the second black ball has been chosen. 

(a) Express Y as the sum of 7 random variables, each of which is equal to either 
0 or 1. 

(b) Find EY]. 

(c) Compare E[Y] to E[X] obtained in Exercise 43. 

(d) Can you explain the result obtained in part (c)? 


A total of r keys are to be put, one at a time, in k boxes, with each key indepen- 
dently being put in box i with probability p;, aa pi = 1. Each time a key is put 
in a nonempty box, we say that a collision occurs. Find the expected number of 
collisions. 


If X is a nonnegative integer valued random variable, show that 


(a) E[X] = > P{X =n} = Do P(X > n} 
n=1 n=0 


Hint: Define the sequence of random variables I,, 1 > 1, by 


a(t) inex 
7 10, ifn> X 


Now express X in terms of the I. 


(b) If X and Y are both nonnegative integer valued random variables, show 
that 


E[XY] = 9° SO P(X =, Y= m) 


n=1m=1 


Consider three trials, each of which is either a success or not. Let X denote the 
number of successes. Suppose that E[X] = 1.8. 

(a) What is the largest possible value of P{X = 3}? 

(b) What is the smallest possible value of P{X = 3}? 

In both cases, construct a probability scenario that results in P{X = 3} having the 
desired value. 
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*48. 


*49, 
50. 


Jt, 


52. 


53. 
54. 


55. 


56. 


If X is a nonnegative random variable, and g is a differential function with g(0) = 0, 
then 


Elg(X)] = I P(X > gl (dt 


Prove the preceding when X is a continuous random variable. 
Prove that E[X2] > (E[X])2. When do we have equality? 


Let c be a constant. Show that 
(a) Var(cX) = c*Var(X); 
(b) Var(c + X) = Var(X). 


A coin, having probability p of landing heads, is flipped until a head appears for 
the rth time. Let N denote the number of flips required. Calculate E[N]. 


Hint: There is an easy way of doing this. It involves writing N as the sum of r 
geometric random variables. 


(a) Calculate E[X] for the maximum random variable of Exercise 37. 
(b) Calculate ELX] for X as in Exercise 33. 
(c) Calculate ELX] for X as in Exercise 34. 


If X is uniform over (0,1), calculate E[X”] and Var(X”). 
Let X and Y each take on either the value 1 or —1. Let 


pd, 1) = P{x = 1, Y = 3}, 
pd, -—1) = P{x =1, Y = -1}, 
pl, 1) =P{x =-1, Y = 3}, 

p--l, -1I) =P{x =-1, Y=-1} 


Suppose that E[X] = E[Y] = 0. Show that 

(a) pd, 1) =p(-1, —-1); 

(b) pd, -) =p(-l, D. 

Let p = 2p(1, 1). Find 

(c) Var(X); 

(d) Var(Y); 

(e) Cov(X, Y). 

Suppose that the joint probability mass function of X and Y is 


P(X =i,Y=p= (emai, 0<i<j 


(a) Find the probability mass function of Y. 

(b) Find the probability mass function of X. 

(c) Find the probability mass function of Y — X. 

There are 1 types of coupons. Each newly obtained coupon is, independently, type 
i with probability p;, i = 1,...,7. Find the expected number and the variance of 
the number of distinct types obtained in a collection of k coupons. 
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58. 


59. 


60. 


61. 


62. 


63. 
*64, 


65. 
66. 


67. 


Suppose that X and Y are independent binomial random variables with parameters 
(1, p) and (m, p). Argue probabilistically (no computations necessary) that X + Y 
is binomial with parameters (” + m, p). 


An urn contains 27 balls, of which r are red. The balls are randomly removed in 
successive pairs. Let X denote the number of pairs in which both balls are red. 

(a) Find E[X]. 

(b) Find Var(X). 

Let X1, X2, X3, and X4 be independent continuous random variables with a com- 
mon distribution function F and let 
p = P{X, < X2 > X3 < X4} 

a) Argue that the value of p is the same for all continuous distribution functions F. 
(b) Find p by integrating the joint density function over the appropriate region. 
(c) Find p by using the fact that all 4! possible orderings of X1,..., X4 are equally 

likely. 
Calculate the moment generating function of the uniform distribution on (0, 1). 
Obtain E[X] and Var[X] by differentiating. 


Let X and W be the working and subsequent repair times of a certain machine. Let 
Y = X + W and suppose that the joint probability density of X and Y is 
fx,y(x,y) =e”, O<x<y<oo 


(a) Find the density of X. 

(b) Find the density of Y. 

(c) Find the joint density of X and W. 

(d) Find the density of W. 

In deciding upon the appropriate premium to charge, insurance companies some- 
times use the exponential principle, defined as follows. With X as the random 
amount that it will have to pay in claims, the premium charged by the insurance 
company is 


2 1 aX 
ae In(E[e**]) 


where a is some specified positive constant. Find P when X is an exponential random 
variable with parameter 4, and a = ad, where 0 <a < 1. 


Calculate the moment generating function of a geometric random variable. 


Show that the sum of independent identically distributed exponential random vari- 
ables has a gamma distribution. 


Consider Example 2.48. Find Cov(X;, Xj) in terms of the ays. 


Use Chebyshev’s inequality to prove the weak law of large numbers. Namely, if 
X 1, X2,... are independent and identically distributed with mean yw and variance 
o* then, for any ¢ > 0, 


{|= + X27 +---+Xn 


u|>«} +0 asm —> oO 
n 


Suppose that X is a random variable with mean 10 and variance 15. What can we 
say about P{5 < X < 15}? 
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68. 


69. 


*70. 


7A: 


a 


73. 


74, 


Let X1, X2,..., X10 be independent Poisson random variables with mean 1. 


(a) Use the Markov inequality to get a bound on P{X; + --- + X19 = 15}. 
(b) Use the central limit theorem to approximate P{X; + --- + Xj9 = 15}. 


If X is normally distributed with mean 1 and variance 4, use the tables to find 
P{2 < X < 3}. 


Show that 
n k 
im ety eu 
mee ee 


Hint: Let X,, be Poisson with mean z. Use the central limit theorem to show that 
P{X, <n} }. 


Let X denote the number of white balls selected when k balls are chosen at random 
from an urn containing 2 white and m black balls. 

(a) Compute P{X = i}. 

(b) Let, fori = 1, 2,...,k; = 1, 2,...,7, 


1, if the ith ball selected is white 
Xj = ? 
0, otherwise 


y= 1, if white ball 7 is selected 
1 10, otherwise 


Compute E[X] in two ways by expressing X first as a function of the X;s and then 
of the Yjs. 


Show that Var(X) = 1 when X is the number of men who select their own hats in 
Example 2.31. 


For the multinomial distribution (Exercise 17), let N; denote the number of times 
outcome i occurs. Find 


(a) E[Ni]; 

(b) Var(Nj); 

(c) Cov(Nj, Nj); 

(d) Compute the expected number of outcomes that do not occur. 


Let X1, X2,... be a sequence of independent identically distributed continuous 
random variables. We say that a record occurs at time nif X, > max(X1,...,Xn—1)- 
That is, X, is a record if it is larger than each of X1,..., X,—-1. Show 


(a) P{a record occurs at time 1} = 1/n; 

(b) E[number of records by time 7] = }“_, 1/1; 

(c) Var(number of records by time 2) = 77, — 1)/i; 

(d) Let N = min{z: > 1 anda record occurs at time 7}. Show E[N] = oo. 


Hint: For (ii) and (iii) represent the number of records as the sum of indicator (that 
is, Bernoulli) random variables. 
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76. 


77. 


78. 


79. 


Let a, < a2 < +--+ < dy denote a set of m numbers, and consider any permutation 
of these numbers. We say that there is an inversion of a; and a; in the permuta- 
tion if i < j and a; precedes a;. For instance the permutation 4, 2, 1, 5, 3 has 5 
inversions—(4, 2), (4, 1), (4, 3), (2, 1), (5, 3). Consider now a random permutation 
of a1, 42,...,4,—in the sense that each of the m! permutations is equally likely to 
be chosen—and let N denote the number of inversions in this permutation. Also, let 


N; = number of k: k < i, a; precedes ag in the permutation 


and note that N = )~_, Nj. 


(a) Show that Ny,...,N, are independent random variables. 
(b) What is the distribution of Nj? 
(c) Compute E[N] and Var(N). 


Let X and Y be independent random variables with means j1x and jry and variances 
oz and Os: Show that 


Var(XY) = O05 + lotors + UZOy 


Let X and Y be independent normal random variables, each having parameters ju 
and o. Show that X + Y is independent of X — Y. 
Hint: Find their joint moment generating function. 


Let $(t1,...,t,) denote the joint moment generating function of X1,..., Xn. 

(a) Explain how the moment generating function of X;, ¢x,(t;), can be obtained 
from #(ty,..., tn): 

(b) Show that X1,...,X, are independent if and only if 


oti, ca) tn) = dx, (f1)--- ox,, (tn) 
With K(t) = log(E [e’*]), show that 


K'(0) = E[X],  K"(0) = Var(X) 


80. Let X denote the number of the events A1,...,Ay, that occur. Express E[X], 
Var(X), and E [(%)] in terms of the quantities S, = ))j,-. <j, P(Ai --- Aig) 
k=1,...,n. 
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3.1 Introduction 


One of the most useful concepts in probability theory is that of conditional 
probability and conditional expectation. The reason is twofold. First, in practice, 
we are often interested in calculating probabilities and expectations when some 
partial information is available; hence, the desired probabilities and expectations 
are conditional ones. Secondly, in calculating a desired probability or expecta- 
tion it is often extremely useful to first “condition” on some appropriate random 
variable. 


3.2 The Discrete Case 


Recall that for any two events E and F, the conditional probability of E given F 
is defined, as long as P(F) > 0, by 


P(EP) 
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Hence, if X and Y are discrete random variables, then it is natural to define 
the conditional probability mass function of X given that Y = y, by 


Pxiy(xly) = P{X =x|Y = y} 
_ Pix =x,Y=y} 
7 PLY = y} 
_ py) 
py) 


for all values of y such that P[Y = y}>0. Similarly, the conditional proba- 
bility distribution function of X given that Y=y is defined, for all y such that 
P{Y = y} > 0, by 
Fyyy(xly) = P{X < x|Y = y} 
=)0 pxiy(aly) 
agx 


Finally, the conditional expectation of X given that Y = y is defined by 


E[X|Y = y] =) xP(X =xlY =} 


x 


= a xpxiy (xly) 


In other words, the definitions are exactly as before with the exception that 
everything is now conditional on the event that Y = y. If X is independent of Y, 
then the conditional mass function, distribution, and expectation are the same as 
the unconditional ones. This follows, since if X is independent of Y, then 


pxy(xly) = P{X = x|Y = y} 


Example 3.1 Suppose that p(x,y), the joint probability mass function of X 
and Y, is given by 


p(,1)=0.5, p1,2)=0.1, p2,1)=01, p(2,2)=0.3 


Calculate the conditional probability mass function of X given that Y = 1. 


Solution: We first note that 


py) => p(x, 1) = pd, 1) + p2,1) = 0.6 
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Hence, 


pxiy(1|1) = P{X = 1/¥ = 1} 


_ PX =1,Y=1} 
~ PLY = 1} 
_ pG,1) 
py(1) 
a 
6 
Similarly, 
p@2,1)_ 1 
2\|1) = = | 
pxy(2|1) way 6 


Example 3.2 If X; and X» are independent binomial random variables with 


respective parameters (71,p) and (v2, p), calculate the conditional probability 
mass function of X; given that Xj + X2 =m. 


Solution: With g=1-—), 
P{X, =k, X1 + X. =m} 
P{X, + X2 =m} 


_ P{X, =k, X2 =m-—k} 
~ P{X, + X2 =m} 


= P{X, = R}P{X> =m-—k} 
i P{X, + X2 =m} 


1) kb ni—k n2 m—k ny—m+k 
Get es 


e + ise ae 


m 


P{X, = k|X1 + XQ =m} = 


where we have used that X1 + X2 is a binomial random variable with param- 
eters (11 +72, p) (see Example 2.44). Thus, the conditional probability mass 
function of X1, given that X; + X2 =m, is 


le 
Bebe 


nt +n2 
m 


(3.1) 
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The distribution given by Equation (3.1), first seen in Example 2.35, is known 
as the hypergeometric distribution. It is the distribution of the number of blue 
balls that are chosen when a sample of m balls is randomly chosen from an 
urn that contains 7; blue and 72 red balls. (To intuitively see why the con- 
ditional distribution is hypergeometric, consider 2; + m2 independent trials 
that each result in a success with probability p; let X; represent the number 
of successes in the first 71 trials and let X2 represent the number of successes 
in the final 7» trials. Because all trials have the same probability of being a 
success, each of the (""*”) subsets of m trials is equally likely to be the success 
trials; thus, the number of the 7 success trials that are among the first 771 trials 
is a hypergeometric random variable.) a 


Example 3.3 If X and Y are independent Poisson random variables with respec- 
tive means A1 and Az, calculate the conditional expected value of X given that 
X+Y=n. 


Solution: Let us first calculate the conditional probability mass function of X 
given that X + Y =n. We obtain 


P{X =k, X+Y=n} 
P{X + Y=n} 
P{xX=k,Y=n-hj 

~ PIX +Y=n} 

_ PIX =APLY =n—k} 

~ P{X+Y=n} 


PIX =k|X+Y=n}= 


where the last equality follows from the assumed independence of X and Y. 
Recalling (see Example 2.37) that X + Y has a Poisson distribution with mean 
Aq + Az, the preceding equation equals 


ek eA2yt-k Tr o-Grtaad (ay + A)" 7! 
ki (n—b)! 
kyn—k 
= n} AqAd 
(n — RIR! (Ay + AQ)” 


=()(aas) Ga) 
~\k Ay tdA2 Ay +2 


In other words, the conditional distribution of X given that X + Y = 7 is the 
binomial distribution with parameters 1 and 41/(A1, + 42). Hence, 


PIX =k|X+Y=n}= 
nN: 


AY 
E{X|X + Y =n} = n———— | 
‘xl Aq + A2 
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Example 3.4 Consider an experiment that results in one of three possible 
outcomes with outcome / occurring with probability p;,i = 1,2, 3, yan pi = 1. 
Suppose that 1 independent replications of this experiment are performed and 
let X;, i = 1,2,3, denote the number of times outcome i appears. Determine the 
conditional expectation of X; given that X) = m. 


Solution: Fork <n—™m, 


P{X, = k, Xo = my} 
P{X2 = m} 


P{X, = k| Xo = my} = 
Now if X; = k and X72 = m, then it follows that X3 =n—k—™m. 
However, 

P{X, =k, X2 =m, X3=n—k—m} 


n! n—k—m 
pipe k™ (3.2) 


~ kimi(n —k—m)! 


This follows since any particular sequence of the 7 experiments having out- 
come 1 appearing k times, outcome 2 m times, and outcome 3 (n— k—m) times 
has probability pkpzpek™ of occurring. Since there are 7! /[k!m!(n—k—m)!] 
such sequences, Equation (3.2) follows. 

Therefore, we have 


n\ pepmp ok 
= 1P2 P3 
P{X, = k|X. =m} = ues k—my)! 
m!\(n — m)! py (1 — p2) 


where we have used the fact that X2 has a binomial distribution with param- 
eters n and p2. Hence, 


(n—m)! pi E P3 aa 
A I a OO Gaia G e) € a 


or equivalently, writing p3 = 1— p1 — pa, 


k n—m—k 
i ‘ee (; e) (1 1 o) 


In other words, the conditional distribution of X1, given that X2 =m, is bino- 
mial with parameters m — m and p1/(1 — p2). Consequently, 


Pi 


E[X1|X2 =m] = GS at)s 6 
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Remarks 


(i) The desired conditional probability in Example 3.4 could also have been computed 
in the following manner. Consider the n — m experiments that did not result in out- 
come 2. For each of these experiments, the probability that outcome 1 was obtained 
is given by 


P{outcome 1, not outcome 2} 


P{outcome 1|not outcome 2} Pinotous 7 
not outcome 


p1 
1—p2 


It therefore follows that, given X2 = m, the number of times outcome 1 occurs is 
binomially distributed with parameters n — m and p;/(1 — p2). 

(ii) Conditional expectations possess all of the properties of ordinary expectations. For 
instance, such identities as 


n n 
E [Sox = | = )CEIXY = yl 
i=1 i=1 
remain valid. a 
Example 3.5 There aren components. Ona rainy day, component i will function 
with probability p;; on a nonrainy day, component i will function with proba- 
bility g;, fori = 1,...,7. It will rain tomorrow with probability a. Calculate the 


conditional expected number of components that function tomorrow, given that 
it rains. 


Solution: Let 


som 1, if component i functions tomorrow 
‘10, otherwise 


Then, with Y defined to equal 1 if it rains tomorrow, and 0 otherwise, the 
desired conditional expectation is obtained as follows. 


E [Sox = | ay Ey = 
t=1 i=1 
i=1 
3.3. The Continuous Case 


If X and Y have a joint probability density function f(x, y), then the conditional 
probability density function of X, given that Y = y, is defined for all values of y 
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such that fy(y) > 0, by 


f (x, y) 
fy) 


xy (xly) = 


To motivate this definition, multiply the left side by dx and the right side by 
(dx dy)/dy to get 
f(x,y) dx dy 
fy(y) dy 
_ Pix < X<xt+dx,y<Y¥<y+dy} 
Ply<Y<y+tdy} 
=P{ix<X<x+dxly<Y¥<y+t+dy} 


fxiy(xly) dx = 


In other words, for small values dx and dy, fx;y(x|y) dx is approximately the 
conditional probability that X is between x and x + dx given that Y is between 
y and y + dy. 

The conditional expectation of X, given that Y = y, is defined for all values of 
y such that fy(y) > 0, by 


EIXIY=y1= [ vais 


Example 3.6 Suppose the joint density of X and Y is given by 


6xy(2—x-y), O<x<1,0<y<1 
foy= 


0, otherwise 


Compute the conditional expectation of X given that Y = y, where 0 < y < 1. 


Solution: We first compute the conditional density 


fxiy(xly) = pan 
_  6xy¥(2-x—-y) 
7 i; 6xy(2 —x — y) dx 
6xy(2 —x — y) 
~  y(4=3y) 
_ 6x(2-—x —y) 


4—-3y 
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Hence, 

1 6x2(2 —x — y) dx 

E[X|Y = y] = 

xy =y)= f = 
_ Q-y)2- 4 
~  4-3y 
ey 2 
8 — 6y 


Example 3.7 Suppose the joint density of X and Y is given by 


Ay(x —y)e“"*, O<x<00,0<y<x 


fin») =| 


0, otherwise 


Compute E[X|Y = y]. 


Solution: The conditional density of X, given that Y = y, is given by 


f(xy) 

fy) 

_ Aye — ye) 
fy AVS ee he 


fxjy (xly) = 


x>y 


Xx 


=| Wwe 
i (x — y)e-* dx 


x 


(x — y)e~ 


=~ ,, “> (by letting w = x — y) 


a=) ©, 2>y 
where the final equality used that [>° we“dw is the expected value of an 


exponential random variable with mean 1. Therefore, with W being exponen- 
tial with mean 1, 


E[X|Y = y] = i} x(x — ye °— dx 
y 


=) (w+ y)we” dw 
0 


= E[W’] + yE[W] 
=2+y | 
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Example 3.8 The joint density of X and Y is given by 


dye, 0<x<w,0<y<2 


f(x,y) = 47 
0 


: otherwise 
What is E[e*/?|Y = 1]? 
Solution: The conditional density of X, given that Y = 1, is given by 
f(x, 1) 
fy) 


1 ,-x 
= —x 


ae e*dx 


fxjy(x|1) = 


Hence, by Proposition 2.1, 


E[eX/1y = 1 = e*!* Fey (x|1) dx 
0 


(oe) 
=i et/? o—* dx 
0 


=2 a 


Example 3.9 Let X; and X> be independent exponential random variables with 
rates 41 and 2. Find the conditional density of X; given that X, + X2 =t. 


Solution: To begin, let us first note that if f(x, y) is the joint density of X, Y, 
then the joint density of X and X + Y is 


fx, x+y (x,t) = f(x, t — x) 


To verify the preceding, note that 


Px<extyen= ff f(u, v)dv du 
ucxjutvect 


de ee 
i i f(u, v)dv du 
x t 
=| [ fluy-mdydu 


where the final equation made the change of variable v = y—u. Differentiating 
the preceding joint distribution function first with respect to t and then with 
respect to x yields the verification. 
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Applying the preceding to our example yields 


fx1,X14+X (x, t) 
fx,4+x, (0) 
pe HI* yy eH2E—®) 
fx, +X (t) 
=_ CeW H1-#2)* 0 < x < t 


fx |X 4X2 (xlt) = 


»5 O<xK<t 


where 


e7H2t 
Ce Hee 
fx,4+x,() 


Now, if 1 = (2, then 
fxxex(xlt)=C, O<x<t 


yielding that C = 1/t and that X, given X; + X2 = tis uniformly distributed 
on (0, f). On the other hand, if 44 4 2, then we use 


C 
1= / fx [X14X> (x|t)dx = ——— (1 - e172") 
0 Mi — b2 


to obtain 


2, ee 
v= 1 — e(i-Ha)t 


thus yielding the result: 


Cars pe ee 
1 — e- (1-H 2)t 


fx,|X;+X, (410) = 
An interesting byproduct of our analysis is that 


ccna pyuze 2? | pte, if wy = 2 = ph = 
X1+x, (0) = ——=——. = ae ae 
1+X2 eC fas J. if uy # wr 


3.4 Computing Expectations by Conditioning 


Let us denote by E[X|Y] that function of the random variable Y whose value at 
Y = yis E[X|Y = y]. Note that E[X| Y] is itself a random variable. An extremely 
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important property of conditional expectation is that for all random variables X 
and Y 


E[X] = E[ELX|Y]] (3.3) 
If Y is a discrete random variable, then Equation (3.3) states that 


E[X] = ) | ELX|Y = yIP(Y = y} (3.3a) 
y 


while if Y is continuous with density fy(y), then Equation (3.3) says that 
BIX]= [ ELXIY = yifv) dy (3.3b) 


We now give a proof of Equation (3.3) in the case where X and Y are both 
discrete random variables. 


Proof of Equation (3.3) When X and Y Are Discrete. We must show that 


E[X] = )° E[X|Y = yIP{Y = y} (3.4) 
Bd 


Now, the right side of the preceding can be written 


FIX SPY Sy) = oS aPxX Sal Saury Sy) 
y ye) 36 


P{X=x,Y=y} 
ae a 


= Se ar an vay) 
yey P{X =x,Y=y} 
> eet 

= EX! 


and the result is obtained. | 


One way to understand Equation (3.4) is to interpret it as follows. It states that 
to calculate E[X] we may take a weighted average of the conditional expected 
value of X given that Y = y, each of the terms E[X|Y = y] being weighted by the 
probability of the event on which it is conditioned. 
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The following examples will indicate the usefulness of Equation (3.3). 


Example 3.10 Sam will read either one chapter of his probability book or one 
chapter of his history book. If the number of misprints in a chapter of his prob- 
ability book is Poisson distributed with mean 2 and if the number of misprints 
in his history chapter is Poisson distributed with mean 5, then assuming Sam is 
equally likely to choose either book, what is the expected number of misprints 
that Sam will come across? 


Solution: Letting X denote the number of misprints and letting 


Y= {> if Sam chooses his history book 


2, if Sam chooses his probability book 
then 


E[X] = E[X|Y = 1]P{Y = 1} + ELX|Y =2]P{Y = 2} 


5 
7 
=5 |_| 


Example 3.11 (The Expectation of the Sum of a Random Number of Random 
Variables) Suppose that the expected number of accidents per week at an indus- 
trial plant is four. Suppose also that the numbers of workers injured in each 
accident are independent random variables with a common mean of 2. Assume 
also that the number of workers injured in each accident is independent of the 
number of accidents that occur. What is the expected number of injuries during 
a week? 


Solution: Letting N denote the number of accidents and X; the number injured 
in the ith accident, i = 1,2,..., then the total number of injuries can be 
expressed as )._, X;. Now, 


E-)-(fEo] 


1 


= “| x by the independence of X; and N 
1 
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which yields 


N 
b> xin] = NE[X] 
i=1 
and thus 
N 
eS x = E[NE[X]] = E[N]E[X] 
i=1 


Therefore, in our example, the expected number of injuries during a week 
equals 4x 2 = 8. a 


The random variable ~™ , X;, equal to the sum of a random number N of inde- 
pendent and identically distributed random variables that are also independent 
of N, is called a compound random variable. As just shown in Example 3.11, the 
expected value of a compound random variable is E[X]E[N]. Its variance will 
be derived in Example 3.19. 


Example 3.12 (The Mean of a Geometric Distribution) A coin, having probability 
p of coming up heads, is to be successively flipped until the first head appears. 
What is the expected number of flips required? 


Solution: Let N be the number of flips required, and let 


Y= (3 if the first flip results in a head 


0, if the first flip results in a tail 
Now, 


E[N] = E[N|Y = 1]P{Y = 1} + E[N|Y = O1P{Y = 0} 
= pE[N|Y = 1] + (1 — p)EIN|Y = 0] (3.5) 


However, 
EIN|Y =1]=1, E[N|Y =0]=1+ E[N] (3.6) 


To see why Equation (3.6) is true, consider E[N| Y = 1]. Since Y =1, we know 
that the first flip resulted in heads and so, clearly, the expected number of flips 
required is 1. On the other hand if Y = 0, then the first flip resulted in tails. 
However, since the successive flips are assumed independent, it follows that, 
after the first tail, the expected additional number of flips until the first head 
is just E[N]. Hence E[N|Y = 0] = 1 + E[N]. Substituting Equation (3.6) 
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into Equation (3.5) yields 
E(N) =p + 1 —p) + ELN]) 
or 
E[N] = 1/p a 


Because the random variable N is a geometric random variable with proba- 
bility mass function p(z) = p(1 — p)""!, its expectation could easily have been 
computed from E[N] = )-f° mp(n) without recourse to conditional expectation. 
However, if you attempt to obtain the solution to our next example without 
using conditional expectation, you will quickly learn what a useful technique 
“conditioning” can be. 


Example 3.13 A miner is trapped in a mine containing three doors. The first 
door leads to a tunnel that takes him to safety after two hours of travel. The 
second door leads to a tunnel that returns him to the mine after three hours of 
travel. The third door leads to a tunnel that returns him to his mine after five 
hours. Assuming that the miner is at all times equally likely to choose any one of 
the doors, what is the expected length of time until the miner reaches safety? 


Solution: Let X denote the time until the miner reaches safety, and let Y denote 
the door he initially chooses. Now, 


+ E[X|Y = 3]P{Y = 3} 
= 3(ELX1Y = 1] + E[X|Y = 2] + E[X|Y = 3]) 


However, 
E[X|Y = 1] =2, 
ELX|Y = 2] =3 + ELX], 
E[X|Y = 3] =5 + EX], (3.7) 


To understand why this is correct consider, for instance, E[X|Y =2], and rea- 
son as follows. If the miner chooses the second door, then he spends three 
hours in the tunnel and then returns to the mine. But once he returns to the 
mine the problem is as before, and hence his expected additional time until 
safety is just E[X]. Hence E[X|Y = 2] = 3 + E[X]. The argument behind the 
other equalities in Equation (3.7) is similar. Hence, 


E[X] = 3(2+ 3+ E[X]+5+E[X]) or E[X]=10 | 
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Example 3.14 (The Matching Rounds Problem) Suppose in Example 2.31 that 
those choosing their own hats depart, while the others (those without a match) 
put their selected hats in the center of the room, mix them up, and then res- 
elect. Also, suppose that this process continues until each individual has his 
own hat. 


(a) Find E[R,] where Ry is the number of rounds that are necessary when 7 individuals 
are initially present. 

(b) Find E[S,] where S, is the total number of selections made by the 7 individuals, 
n>2. 

(c) Find the expected number of false selections made by one of the 1 people, 7 > 2. 


Solution: (a) It follows from the results of Example 2.31 that no matter how 
many people remain there will, on average, be one match per round. Hence, 
one might suggest that E[R,] = 7. This turns out to be true, and an induction 
proof will now be given. Because it is obvious that E[R;] = 1, assume that 
E[R,] = & fork = 1,...,2—1. To compute E[R,,], start by conditioning on 
Xn, the number of matches that occur in the first round. This gives 


n 
E[Rn] = >> E[RalXn = i)P{Xn = #} 
i=0 
Now, given a total of i matches in the initial round, the number of rounds 


needed will equal 1 plus the number of rounds that are required when n — i 
persons are to be matched with their hats. Therefore, 


E[Rn] = 0. + E[Rn—il)P{Xn = #} 
i=0 


= 1+ E[Rn]P{Xn = 0} + D> E[RniJP{Xn = 1} 
i=1 


= 1+ E[Rn]P{Xn = 0} + 9 (n- i) P{Xn = i} 
i=1 
by the induction hypothesis 
= 1+ E[Ry|P{X, = 0} + n(1 — P{Xy, = 0}) — E[Xn] 
= E[Ry|P{Xn = 0} + n(1 — P{Xn = 0}) 
where the final equality used the result, established in Example 2.31, that 


E[X,] = 1. Since the preceding equation implies that E[R,] = 1, the result 
is proven. 
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(b) For 2 > 2, conditioning on X,, the number of matches in round 1, gives 


E[Sn] = ) > ElSn|Xn = iJP{Xn = i} 


i=0 
= 0G + ELS, P{Xn = 3} 
i=0 
=n+ \TESy—ilP(Xn = i) 
i=0 


where E[So] = 0. To solve the preceding equation, rewrite it as 
E{S,] = 1 + E[S,_x,,] 


Now, if there were exactly one match in each round, then it would take a total 
of 1 +2+---+n= n(n + 1)/2 selections. Thus, let us try a solution of 
the form E[S,] = an + bn. For the preceding equation to be satisfied by a 
solution of this type, for 7 > 2, we need 

an+ br? =n+ Ela(n — X,) + bm - Xn)" ] 
or, equivalently, 


an + bn* =n+a(n—E[Xy]) + b(n? — 2nE[Xy] + E[X?]) 


Now, using the results of Example 2.31 and Exercise 72 of Chapter 2 that 
E[X,] = Var(X,,) = 1, the preceding will be satisfied if 


an + bn? =n+an—a+ br? —2nb+2b 
and this will be valid provided that b = 1/2, a = 1. That is, 
E[S,] =n +n? /2 
satisfies the recursive equation for E[S,]. 
The formal proof that E[S,] = 1 + 7/2, n > 2, is obtained by induction 
on v. It is true when n = 2 (since, in this case, the number of selections is 


twice the number of rounds and the number of rounds is a geometric random 
variable with parameter p = 1/2). Now, the recursion gives 


E[Sp] = 2 + E[Sp\P{X, = 0} + D> ElSp_-AP{Xn = i) 
i=1 
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Hence, upon assuming that E[So] = E[S;] = 0, E[S,] = k + k*/2, fork = 
2,...,2—1 and using that P{X, = 1 — 1} = 0, we see that 


E[Sy] = 2 + E[Sp|P{Xn = 0} + D> [ni + (2 -i)?/2)P{Xp = i} 
i=1 


=n+ E[S,|P{X, = 0} + (n+ n*/2)(1 — P{X,, = 0}) 
— (n+ DE[Xn] + E[X2]/2 


Substituting the identities EL[X;,] = 1, ELX?] = 2 in the preceding shows that 
E[Sn] =n +n? /2 


and the induction proof is complete. 
(c) If we let C; denote the number of hats chosen by person j, j = 1,..., then 


SC a5; 
j=1 


Taking expectations, and using the fact that each C; has the same mean, yields 
the result 


E[Cj] = E[Spl/n = 1 + n/2 
Hence, the expected number of false selections by person j is 


E[C; - 1] =n/2. = 


Example 3.15 Independent trials, each of which is a success with probability p, 
are performed until there are k consecutive successes. What is the mean number 
of necessary trials? 


Solution: Let N; denote the number of necessary trials to obtain k consecutive 
successes, and let M, denote its mean. We will obtain a recursive equation for 
Mz by conditioning on Nz,_1, the number of trials needed for k — 1 consecutive 
successes. This yields 


My = ELNz] = E[E(Ng|Ng_1]] 
Now, 


E(Ng|Ne-1] = Ne-1 + 1+ (1 — p)ELNe] 


114 Conditional Probability and Conditional Expectation 


where the preceding follows since if it takes Np_j trials to obtain k — 1 
consecutive successes, then either the next trial is a success and we have our 
k in a row or it is a failure and we must begin anew. Taking expectations of 
both sides of the preceding yields 


My, = Mp_-1 +14+ 1 -—p)My 
or 


1 M 
ee k-1 
p p 


Since Ny, the time of the first success, is geometric with parameter p, we see that 


1 
M, = = 
p 


and, recursively 


1 1 

M2=-+-—, 
pp? 
1 1 1 

M3=-+— 5+ 
p p> Pp 

and, in general, 

1 1 1 

MN er aay co ae | 
p p p 


Example 3.16 (Analyzing the Quick-Sort Algorithm) Suppose we are given a set 
of n distinct values—x1,...,x,—and we desire to put these values in increas- 
ing order or, as it is commonly called, to sort them. An efficient procedure for 
accomplishing this is the quick-sort algorithm, which is defined recursively as 
follows: When n = 2 the algorithm compares the two values and puts them 
in the appropriate order. When 1 > 2 it starts by choosing at random one of 
the 7 values—say, x;—and then compares each of the other m — 1 values with 
x;, noting which are smaller and which are larger than x;. Letting S$; denote the 
set of elements smaller than x;, and S; the set of elements greater than x;, the 
algorithm now sorts the set S; and the set S;. The final ordering, therefore, con- 
sists of the ordered set of the elements in S;, then x;, and then the ordered set 
of the elements in S;. For instance, suppose that the set of elements is 10, 5, 8, 
2, 1, 4, 7. We start by choosing one of these values at random (that is, each 
of the 7 values has probability of ; of being chosen). Suppose, for instance, 
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that the value 4 is chosen. We then compare 4 with each of the other six values 
to obtain 


{2, 1}, 4, {10, 5, 8, 7} 
We now sort the set {2, 1} to obtain 
1,2, 4, {10, 5, 8, 7} 


Next we choose a value at random from {10, 5, 8, 7}—-say 7 is chosen—and com- 
pare each of the other three values with 7 to obtain 


1,2, 4,5, 7, {10, 8} 
Finally, we sort {10, 8} to end up with 
1,2,4,5,7,8, 10 


One measure of the effectiveness of this algorithm is the expected number of 
comparisons that it makes. Let us denote by M,, the expected number of com- 
parisons needed by the quick-sort algorithm to sort a set of 1 distinct values. 
To obtain a recursion for M, we condition on the rank of the initial value selected 
to obtain 


n 

1 

M, = ye E[number of comparisons|value selected is jth smallest] — 
j=l 


Now, if the initial value selected is the jth smallest, then the set of values smaller 


than it is of size j — 1, and the set of values greater than it is of size n — j. Hence, 
as n — 1 comparisons with the initial value chosen must be made, we see that 


n 
1 
Mn =>) (-14+Mj1+ Mn) ~ 


j=l 
2 n—1 
=n—-1+ — Mk (since Mo = 0) 
k=1 
or, equivalently, 
n—1 


nMy = n(n—1) +25 Mp 
k=1 
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To solve the preceding, note that upon replacing m by m + 1 we obtain 
n 
(2+ V)Mny1 = (n+ In +25) My 
k=1 
Hence, upon subtraction, 
(n+ 1)My41 —nM, = 2n+ 2M, 
or 
(1+ 1I)Mys1 = (4+ 2)M, + 2n 
Therefore, 


Mn4y1 2n Mn 


n+2  (+)m@+2) a+ 


Iterating this gives 


Mn+1 2n 2(n—1) Mn-1 


n+2° (nt+1)(n+2)) n(n +1) n 


— k 


2 ei aris Ua 


Hence, 


n—-1 
n—k 
MOP) Cane a= 


i 


= 2042) ) GaGa’ n2l 


Using the identity i/(i + 1)(@ + 2) = 2/(@+ 2)—1/(@ + 1), we can approximate 
My+1 for large 1 as follows: 


Re) rod | 
Muss = 20142], >; 
mag a i+1 


i=1 


n+2 n+1 
~2in+ 0] f = de f ~ ds] 
3 x 2 x 
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= 2(n + 2)[2 log(a + 2) — log(z + 1) + log2 — 2 log 3] 
n+2 
n+1 
~ 2(n + 2) log(n + 2) | 


=2(n+2) [loan + 2) + log + log 2 — 2 log 3| 


Although we usually employ the conditional expectation identity to more easily 
enable us to compute an unconditional expectation, in our next example we show 
how it can sometimes be used to obtain the conditional expectation. 


Example 3.17 In the match problem of Example 2.31 involving n, 1 > 1, indi- 
viduals, find the conditional expected number of matches given that the first 
person did not have a match. 


Solution: Let X denote the number of matches, and let X1 equal 1 if the first 
person has a match and 0 otherwise. Then, 


E[X] = E[X|Xq = O]P{X1 = 0} + E[X|X, = 1]P{X1 = 1} 
= E[X|X1 = 0] nt + E[X|X1 = 1] “ 
But, from Example 2.31 
E[X] =1 


Moreover, given that the first person has a match, the expected number of 
matches is equal to 1 plus the expected number of matches when n — 1 people 
select among their own n — 1 hats, showing that 


E[X|X, =1] =2 
Therefore, we obtain the result 


Ee 01 : 
n—1 
3.4.1 Computing Variances by Conditioning 


Conditional expectations can also be used to compute the variance of a random 
variable. Specifically, we can use 


Var(X) = E[X2] — (E[X])? 


and then use conditioning to obtain both E[X] and E[X?]. We illustrate this 
technique by determining the variance of a geometric random variable. 
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Example 3.18 (Variance of the Geometric Random Variable) Independent trials, 
each resulting in a success with probability p, are performed in sequence. Let N 
be the trial number of the first success. Find Var(N). 


Solution: Let Y = 1 if the first trial results in a success, and Y = 0 otherwise. 
Var(N) = E[N?] — (E[N])? 

To calculate E[N2] and E[N] we condition on Y. For instance, 
E[N?] = E[E[N?|Y]] 

However, 


E[N2|Y = 1] = 1, 
E[N2|Y = 0] = E[ + N)?] 


These two equations are true since if the first trial results in a success, then 
clearly N=1 and so N*=1. On the other hand, if the first trial results in 
a failure, then the total number of trials necessary for the first success will 
equal one (the first trial that results in failure) plus the necessary number of 
additional trials. Since this latter quantity has the same distribution as N, we 
get that E[N?|Y = 0] = E[(1 + N)?]. Hence, we see that 


E[N?] = E[N?|Y = 1]P{Y = 1} + E[N?|Y = O]P{Y = 0} 
= p+E[(1+N)*](1—p) 
=1+(1—p)E[2N +N’] 


Since, as was shown in Example 3.11, E[N] = 1/p, this yields 


E[N2] =1+ aoe + (1 — p)E[N?] 
or 


2-p 
pe 


E[N?] = 
Therefore, 
Var(N) = E[N2] — (E[N])2 
-() 
p? p 
1—p 
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Another way to use conditioning to obtain the variance of a random variable 
is to apply the conditional variance formula. The conditional variance of X given 
that Y = y is defined by 


Var(X|Y = y) = E[(X — E[X|Y =y))*|lY =] 


That is, the conditional variance is defined in exactly the same manner as the 
ordinary variance with the exception that all probabilities are determined con- 
ditional on the event that Y = y. Expanding the right side of the preceding and 
taking expectation term by term yields 


Var(X|Y = y) = E[X?|Y = y] — (ELX|Y = yl)? 


Letting Var(X|Y) denote that function of Y whose value when Y=y is 
Var(X|Y = y), we have the following result. 


Proposition 3.1 (The Conditional Variance Formula) 

Var(X) = E[Var(X|Y)] + Var(ELX|Y]) (3.8) 
Proof. 

E[Var(X|Y)] = E[ELX?|Y] — (ELX|Y))7] 


[ELX?|Y]] — E[(ELX1Y)7] 
[X*] — E[(E[X|Y])7] 


E 
E 


and 


Var(E[X|Y]) = E[(ELX1Y])7] = (ELELX|Y1])* 
= E[(E[X|Y)7] — (Exp? 


Therefore, 
E[Var(X|Y)] + Var(E[X|Y]) = E[X?] — (E[X])* 


which completes the proof. a 


Example 3.19 (The Variance of a Compound Random Variable) Let Xj, 
X>,... be independent and identically distributed random variables with distri- 
bution F having mean jz and variance o*, and assume that they are independent 
of the nonnegative integer valued random variable N. As noted in Example 3.11, 
where its expected value was determined, the random variable S = 7, X; is 
called a compound random variable. Find its variance. 
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Solution: Whereas we could obtain E[S*] by conditioning on N, let us instead 
use the conditional variance formula. Now, 


N 
Var(S|N = 7) = Var (> X;|N = * 


i=1 


= Var » X\|N = " 


i=1 
Var (> x) 
i=1 


= no 


By the same reasoning, 
E[S|IN =n] =n 
Therefore, 
Var(S|IN) = No”, E[S|N] = Nu 
and the conditional variance formula gives 
Var(S) = E[No*] + Var(Ny) = o7E[N] + u2Var(N) 


If N is a Poisson random variable, then § = 7, X; is called a compound 
Poisson random variable. Because the variance of a Poisson random variable 
is equal to its mean, it follows that for a compound Poisson random variable 
having E[N] = 2 


Var(S) = Ao? + Ap* = AE[X?] 
where X has the distribution F. | 


Example 3.20 (The Variance in the Matching Rounds Problem) Consider the 
matching rounds problem of Example 3.14, and let V,, = Var(R,,) denote the 
variance of the number of rounds needed when there are initially 7 people. Using 
the conditional variance formula, we will show that 


Vi=n, n>2 


The proof of the preceding is by induction on n. To begin, note that when 2 = 2 
the number of rounds needed is geometric with parameter p = 1/2 and so 
{= 
Y= a =o 
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So assume the induction hypothesis that 

Vi=], 2<j<n 
and now consider the case when there are 7 individuals. If X is the number of 
matches in the first round then, conditional on X, the number of rounds Ry, is 
distributed as 1 plus the number of rounds needed when there are initially 1 — X 


individuals. Consequently, 


E[Rn|X] = 1 + E[Ry_-x] 
=1+n-—X_ by Example 3.14 


Also, with Vo = 0, 
Var(Ry|X) = Var(Rn_ x)= Vi x 
Hence, by the conditional variance formula 


= E[Var(Rn|X)] + Var(E[Rnl|X1) 
= E[Vn—x] + Var(X) 


3 


Vy—jP(X = 7) + Var(X) 
j=0 


= V,P(X =0) + > Vn—jP(X = j) + Var(X) 
j=l 


Because P(X =n—1) =0, it follows from the preceding and the induction hypoth- 
esis that 


Vn = VaP(X = 0) + Do(n—j)P(X = j) + Var(X) 
j=l 
= V,,P(X = 0) + n(1 — P(X = 0)) — E[X] + Var(X) 


As it is easily shown (see Example 2.31 and Exercise 72 of Chapter 2) that E[X] = 
Var(X) = 1, the preceding gives 


Vn = VnP(X = 0) + nl — P(X = 0)) 


thus proving the result. a 
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3.5 Computing Probabilities by Conditioning 


Not only can we obtain expectations by first conditioning on an appropriate 
random variable, but we may also use this approach to compute probabilities. 
To see this, let E denote an arbitrary event and define the indicator random 
variable X by 


os 1, if E occurs 
0, if E does not occur 


It follows from the definition of X that 


E[X] = P(E), 
E[X|Y = y] = P(E|Y = y), for any random variable Y 


Therefore, from Equations (3.3a) and (3.3b) we obtain 
P(E)=)° P(E|Y=y)P(Y=y), __ if Y is discrete 
y 


= 1 P(E|Y = y)fy(y) dy, _ if Y is continuous 


—cC 


Example 3.21 Suppose that X and Y are independent continuous random vari- 
ables having densities fx and fy, respectively. Compute P{X < Y}. 


Solution: Conditioning on the value of Y yields 


Pxeye / P(X < YIY = yifv(y) dy 


i; P(X < l¥ = y}fy) dy 


z / P(X < y}fy(y) dy 


2 / Fx(y)fy(y) dy 


where 


y 
Fx(y) = i fx (x) dx | 


Example 3.22 An insurance company supposes that the number of accidents 
that each of its policyholders will have in a year is Poisson distributed, with the 
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mean of the Poisson depending on the policyholder. If the Poisson mean of a 
randomly chosen policyholder has a gamma distribution with density function 


g(a) =ae*, ADO 
what is the probability that a randomly chosen policyholder has exactly n acci- 


dents next year? 


Solution: Let X denote the number of accidents that a randomly chosen pol- 
icyholder has next year. Letting Y be the Poisson mean number of accidents 
for this policyholder, then conditioning on Y yields 


P{xX=n= P{X =n|Y = A}g(a) da 
0 


[o@) 4” 
/ evi he ah 
0 n! 


1 eo) 
= a qutl,—2 dh 
-J0 


However, because 


hO) 2e-2 (Qaytti " 6 
SS > 
(n+1)! ° 
is the density function of a gamma (n + 2,2) random variable, its integral is 1. 
Therefore, 


i) —22r FN n+1 gnt2 oe) 
1 =) ein Bee / qrtle-2a gy 
0 (n+ 1)! (n+ 1)! Jo 


showing that 


n+1 


Example 3.23 Suppose that the number of people who visit a yoga studio each 
day is a Poisson random variable with mean A. Suppose further that each person 
who visits is, independently, female with probability p or male with probabil- 
ity 1—p. Find the joint probability that exactly n women and m men visit the 
academy today. 


Solution: Let N; denote the number of women and N> the number of men 
who visit the academy today. Also, let N = N; + No be the total number of 
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people who visit. Conditioning on N gives 


(o,@) 
P{N =n, Ny =m} = >) P{N, =n, No = m|N =i} P{N = 3} 
i=0 


Because P{N, =n, No =m|N = i} =0 wheni 4 n + m, the preceding equation 
yields 
parm 


P{N, =n, No =m} = P{N, =n, No =m|N =n + mje * ———_ 
(n+ my)! 


Given that 1 + m people visit it follows, because each of these » + m is 
independently a woman with probability p, that the conditional probability 
that 7 of them are women (and m are men) is just the binomial probability of 
n successes in n + m trials. Therefore, 


-_ ex _ n+m n a m—k aa 
PIN =m No =m) = ( - p (L=p)"e Ree 
_ (n+m)!_,, ere eerie, yi 
pie ere e (n+ m)! 
— p (Ap)” o-¥(1-p) (ACL — p))” 
~ n! m! 


Because the preceding joint probability mass function factors into two prod- 
ucts, one of which depends only on 7 and the other only on m, it follows that 
N, and N» are independent. Moreover, because 


P{N, =n} = Se P{N; =n, No =m} 


m=0 


hp OD 94 OO =P)” — 55 Op)" 
oe cm =. apy Se ae 


and, similarly, 


P{Nz = m} = e*1-P) Malan 2a 

m! 
we can conclude that N; and Np are independent Poisson random variables 
with respective means Ap and A(1 — p). Therefore, this example establishes the 
important result that when each of a Poisson number of events is independently 
classified either as being type 1 with probability p or type 2 with probability 
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1 — p, then the numbers of type 1 and type 2 events are independent Poisson 
random variables. a 


The result of Example 3.23 generalizes to the case where each of a Poisson 
distributed number of events, N, with mean A is independently classified as 
being one of k types, with the probability that it is type i being p;,i = 1,...,k, 
aay pi = 1. If Nj; is the number that are classified as type i, then Ni,..., Nz are 
independent Poisson random variables with respective means Ap1,..., Apg. This 


follows, since for n = ye, Nj 


P(N, =71,...,Np, =p) = P(Ni =ny,...,.Np=np|N =n)P(N =n) 
n! 


i m1 Nk hyn 
ee eee ar my Cas 
ncone Pr / 


k 
=[ Je" api" /ni! 


i=1 
where the second equality used that, given a total of 1 events, the numbers of 
each type has a multinomial distribution with parameters (7, p1,..., Dx). 


Example 3.24 (The Distribution of the Sum of Independent Bernoulli Random 
Variables) Let X1,..., X, be independent Bernoulli random variables, with X; 
having parameter p;,i = 1,...,7. That is, P{X; = 1} = p;, P{X; = 0} =q; = 
1 — p;. Suppose we want to compute the probability mass function of their sum, 
X1 +--+ + X,.To do so, we will recursively obtain the probability mass function 
of X; +---+ Xz, first for k = 1, then k = 2, and on up to k = n. To begin, let 


Pp) = P(X +--+ + Xp =H} 
and note that 
k k 
Pek) =[[ pi, PeO =| [ai 
i=l i=l 
For 0 <j <k, conditioning on Xz, yields the recursion 
Pp) =P{X1+ +++ + Xp =H X_ = Upe t+ P{X1 + ++» + Xp = [|Xp = Oa 
= P{X, +--+ + Xp_1 =f — Xz = Upp 
+ P{Xy + +++ + Xp_-1 = f1Xy = Ole 
= P{X, + +--+ Xp = J — Upp + P{X1 +--+ + Xp-1 = Shae 
= PePpiG-— D+ ae Pei) 
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Starting with P,(1) = pi, P1(0) = q1, the preceding equations can be recursively 
solved to obtain the functions P2(/), P3(j), up to Py(j). | 


Example 3.25 (The Best Prize Problem) Suppose that we are to be presented 
with 7 distinct prizes in sequence. After being presented with a prize we must 
immediately decide whether to accept it or reject it and consider the next prize. 
The only information we are given when deciding whether to accept a prize is the 
relative rank of that prize compared to ones already seen. That is, for instance, 
when the fifth prize is presented we learn how it compares with the first four 
prizes already seen. Suppose that once a prize is rejected it is lost, and that our 
objective is to maximize the probability of obtaining the best prize. Assuming 
that all 1! orderings of the prizes are equally likely, how well can we do? 


Solution: Rather surprisingly, we can do quite well. To see this, fix a value 
k,0 < k <n, and consider the strategy that rejects the first k prizes and then 
accepts the first one that is better than all of those first k. Let P, (best) denote 
the probability that the best prize is selected when this strategy is employed. 
To compute this probability, condition on X, the position of the best prize. 
This gives 


Py (best) = } > Py(best|X = i)P(X = i) 
i=1 


1 n 
. Y > Pg(best|X = i) 
i=1 


Now, if the overall best prize is among the first k, then no prize is ever selected 
under the strategy considered. On the other hand, if the best prize is in posi- 
tion i, where i > k, then the best prize will be selected if the best of the first 
k prizes is also the best of the first i — 1 prizes (for then none of the prizes in 
positions k + 1,k +2,...,i—1 would be selected). Hence, we see that 


P,(best|X =i) =0, ifi<k 
P,(best|X = 7) = P{best of first i — 1 is among the first k} 
2hG=1), ates 


From the preceding, we obtain 


Pg(best) = = os : 


3.5 Computing Probabilities by Conditioning 127 


~ * tog (") 


Now, if we consider the function 


ro) on 2 


and so 
g(x) =0 => log(a/x) =1>x=n/e 


Thus, since P,(best) © g(k), we see that the best strategy of the type considered 
is to let the first 2/e prizes go by and then accept the first one to appear that 
is better than all of those. In addition, since g(n/e) = 1/e, the probability that 
this strategy selects the best prize is approximately 1/e ~ 0.36788. 


Remark Most students are quite surprised by the size of the probability of 
obtaining the best prize, thinking that this probability would be close to 0 when 
nis large. However, even without going through the calculations, a little thought 
reveals that the probability of obtaining the best prize can be made to be rea- 
sonably large. Consider the strategy of letting half of the prizes go by, and then 
selecting the first one to appear that is better than all of those. The probability 
that a prize is actually selected is the probability that the overall best is among 
the second half and this is 1/2. In addition, given that a prize is selected, at the 
time of selection that prize would have been the best of more than 1/2 prizes 
to have appeared, and would thus have probability of at least 1/2 of being the 
overall best. Hence, the strategy of letting the first half of all prizes go by and then 
accepting the first one that is better than all of those prizes results in a probability 
greater than 1/4 of obtaining the best prize. a 


Example 3.26 Ata party 1 men take off their hats. The hats are then mixed up 
and each man randomly selects one. We say that a match occurs if a man selects 
his own hat. What is the probability of no matches? What is the probability of 
exactly k matches? 


Solution: Let E denote the event that no matches occur, and to make explicit 
the dependence on , write P, = P(E). We start by conditioning on whether 
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or not the first man selects his own hat—call these events M and M°. Then 
P, = P(E) = P(E|M)P(M) + P(E|M°)P(M°) 
Clearly, P(E|M) = 0, and so 


n—-1 


P, = P(E|M‘) (3.9) 
Now, P(E|M‘) is the probability of no matches when nm — 1 men select from a 
set of 2 — 1 hats that does not contain the hat of one of these men. This can 
happen in either of two mutually exclusive ways. Either there are no matches 
and the extra man does not select the extra hat (this being the hat of the man 
that chose first), or there are no matches and the extra man does select the extra 
hat. The probability of the first of these events is just P,_1, which is seen by 
regarding the extra hat as “belonging” to the extra man. Because the second 
event has probability [1/(# — 1)]P,—2, we have 


1 
P(E|M*) = Py-1 + —7Pp-2 
n—1 


and thus, from Equation (3.9), 


n—-1 1 
P= Py—-1 + —Pn_2 
nN nN 


or, equivalently, 
1 
Pa — Pu-1 = —7 (Pn-1 — Pua) (3.10) 


However, because P,, is the probability of no matches when 7 men select among 
their own hats, we have 


(P2 — P1) 1 1 1 
pe yo gy: OS ap 
(P3—P2) 1 1 1 1 
oS fg 8 a a a 
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To obtain the probability of exactly k matches, we consider any fixed group 
of k men. The probability that they, and only they, select their own hats is 


i _@-b!, 
nn-1 n-—(k-—1) Lee 


al n—k 


where P,,_, is the conditional probability that the other n— k men, selecting 
among their own hats, have no matches. Because there are (71) choices of a set 
of k men, the desired probability of exactly k matches is 


1 Dy i re ca ae 
Pye 2! 3! | (m#— Bb)! 


k! k! 


which, for 7 large, is approximately equal to e~!/k!. 


Remark The recursive equation, Equation (3.10), could also have been obtained 
by using the concept of a cycle, where we say that the sequence of distinct individu- 
als i1,i2,..., ip constitutes a cycle if i; chooses i7’s hat, i2 chooses i3’s hat, ... , ig_1 
chooses i,’s hat, and ig chooses i;’s hat. Note that every individual is part of a 
cycle, and that a cycle of size k = 1 occurs when someone chooses his or her own 
hat. With E being, as before, the event that no matches occur, it follows upon 
conditioning on the size of the cycle containing a specified person, say person 1, 
that 


Py, = P(E) = > PIC =k)P(C = b) (3.11) 
k=1 


where C is the size of the cycle that contains person 1. Now, call person 1 the 
first person, and note that C = k if the first person does not choose 1’s hat; 
the person whose hat was chosen by the first person—call this person the second 
person—does not choose 1’s hat; the person whose hat was chosen by the second 
person—call this person the third person—does not choose 1’s hat; ..., the person 
whose hat was chosen by the (k — 1)st person does choose 1’s hat. Consequently, 


n-1n-2 n—-k+1 1 ok 
n—1 n-kt+2n-k+1 on 


P(C=k)= (3.12) 


That is, the size of the cycle that contains a specified person is equally likely to 
be any of the values 1,2,...,. Moreover, since C = 1 means that 1 chooses his 
or her own hat, it follows that 


P(E|C = 1) =0 (3.13) 
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On the other hand, if C = k, then the set of hats chosen by the k individuals 

in this cycle is exactly the set of hats of these individuals. Hence, conditional on 

C =k, the problem reduces to determining the probability of no matches when 

n—k people randomly choose among their own 1 — k hats. Therefore, for k > 1 
P(E|C =k) = Py_p 


Substituting (3.12), (3.13), and (3.14) back into Equation (3.11) gives 


1 n 
P, = : So Pak (3.14) 
k=2 
which is easily shown to be equivalent to Equation (3.10). g 


Example 3.27 (The Ballot Problem) In an election, candidate A receives n votes, 
and candidate B receives m votes where n > m. Assuming that all orderings are 
equally likely, show that the probability that A is always ahead in the count of 
votes is (7 — m)/(n +m). 


Solution: Let P,,,, denote the desired probability. By conditioning on which 
candidate receives the last vote counted we have 


Pum = P{A always ahead|A receives last vote} 
n+m 


+ P{A always ahead|B receives last vote} 
n+m 


Now, given that A receives the last vote, we can see that the probability that A 
is always ahead is the same as if A had received a total of nm — 1 and B a total 
of m votes. Because a similar result is true when we are given that B receives 
the last vote, we see from the preceding that 


nN m 
Pam = Pane meth a nen (3.15) 


We can now prove that P,,, = (1 — m)/(n + m) by induction on 2 + m. As 
it is true when 2 + m = 1, that is, Pj,9 = 1, assume it whenever 7 + m = k. 
Then when x + m = k + 1, we have by Equation (3.15) and the induction 
hypothesis that 


n n—-1l-m m n—-m+i1 
Pram = 
n+tmn—-1l+m mt+tnn+m—1 
_n-m 
“n+tm 


and the result is proven. a 


3.5 Computing Probabilities by Conditioning 131 


The ballot problem has some interesting applications. For example, consider 
successive flips of a coin that always land on “heads” with probability p, and let 
us determine the probability distribution of the first time, after beginning, that 
the total number of heads is equal to the total number of tails. The probability 
that the first time this occurs is at time 2” can be obtained by first conditioning 
on the total number of heads in the first 27 trials. This yields 


P{first time equal = 27} 


= P{first time equal = 2n|n heads in first 27} (Cora —p)” 


Now, given a total of 1 heads in the first 27 flips we can see that all possible 
orderings of the heads and 1 tails are equally likely, and thus the preceding 
conditional probability is equivalent to the probability that in an election, in 
which each candidate receives n votes, one of the candidates is always ahead 
in the counting until the last vote (which ties them). But by conditioning on 
whomever receives the last vote, we see that this is just the probability in the 
ballot problem when m = n — 1. Hence, 
. 2n 
P{first time equal = 27} = Paiaeal pra — p)” 


2; 
( ora — p)" 
OS ras 


2n—-1 


Suppose now that we wanted to determine the probability that the first time 
there are i more heads than tails occurs after the (2m + i)th flip. Now, in order 
for this to be the case, the following two events must occur: 


(a) The first 2” + 7 tosses result in m + i heads and 1 tails; and 
(b) The order in which the 7 + 7 heads and tails occur is such that the number of heads 
is never 7 more than the number of tails until after the final flip. 


Now, it is easy to see that event (b) will occur if and only if the order of appearance 
of the 7 + i heads and 7 tails is such that starting from the final flip and working 
backwards heads is always in the lead. For instance, if there are 4 heads and 2 
tails (7 = 2, i = 2), then the outcome ____ TH would not suffice because there 
would have been 2 more heads than tails sometime before the sixth flip (since the 
first 4 flips resulted in 2 more heads than tails). 

Now, the probability of the event specified in (a) is just the binomial probability 
of getting 7 + i heads and n tails in 2 + i flips of the coin. 

We must now determine the conditional probability of the event specified in 
(b) given that there are 7 + i heads and 7 tails in the first 2” + i flips. To do so, 
note first that given that there are a total of m + i heads and 7 tails in the first 
2n + i flips, all possible orderings of these flips are equally likely. As a result, 
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the conditional probability of (b) given (a) is just the probability that a random 

ordering of m + i heads and n tails will, when counted in reverse order, always 

have more heads than tails. Since all reverse orderings are also equally likely, it 

follows from the ballot problem that this conditional probability is i/(2n + i). 
That is, we have shown that 


DAN, 616 
P{a} = ( ae ‘Vora — py", 


i 
eats 
and so 
; .. ; . 2n+i . i 
P{first time heads leads by iis after flip 2” + i} = ( )pmma —p)” : 
n 2n+1 


Example 3.28 Let U;, U2,... be a sequence of independent uniform (0, 1) ran- 
dom variables, and let 


N= min{n > 2: Uy, > Uy_1} 
and 
M=min{n > 1: Uy +---+ Uy, > 1} 


That is, N is the index of the first uniform random variable that is larger than 
its immediate predecessor, and M is the number of uniform random variables 
we need sum to exceed 1. Surprisingly, N and M have the same probability 
distribution, and their common mean is e! 


Solution: It is easy to find the distribution of N. Since all 2! possible orderings 
of U1,..., Uy, are equally likely, we have 


P{N > n} = P{U, > U2 >--- > Un} = 1/n! 


To show that P{M > n} = 1/n!, we will use mathematical induction. However, 
to give ourselves a stronger result to use as the induction hypothesis, we will 
prove the stronger result that for 0 < x < 1,P{M(x) > nm} = x"/n!,n > 1, 
where 


M(x) = min{n > 1: U, +---+ Uy, > x} 


is the minimum number of uniforms that need be summed to exceed x. To 
prove that P{M(x) > n} = x”/n!, note first that it is true for 2 = 1 since 


P{M(x) > 1} = P{U, <x} =x 
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So assume that for all 0 < x < 1, P{M(x) > n} = x"/n!. To determine 
P{M(x) > 2+ 1}, condition on Uj to obtain 


1 
PIMG) > n+ 1) =f P{M(x) > 2+ 1|U; = y} dy 
0 
=) P{M(x) > m+ 1|U; = 9} dy 
0 
= | Pimee—y > m dy 
0 


= / aw dy by the induction hypothesis 
0 ‘. 


where the third equality of the preceding follows from the fact that given Uy = 
y, M(x) is distributed as 1 plus the number of uniforms that need be summed 
to exceed x — y. Thus, the induction is complete and we have shown that for 
O0<x<i,n21, 


P{M(x) > n} = x” /n! 
Letting x = 1 shows that N and M have the same distribution. Finally, we have 
(oe) CO 
E[M] = E[N]= )> P{N>n}=) > 1/nl=e a 
n=0 n=0 
Example 3.29 Let X1, X2,... be independent continuous random variables with 


a common distribution function F and density f = F’, and suppose that they are 
to be observed one at a time in sequence. Let 


N =min{n > 2: X,, = second largest of X1,...,; Xn} 
and let 
M =min{n > 2: X, = second smallest of X1,..., Xn} 
Which random variable—Xvq, the first random variable which when observed 


is the second largest of those that have been seen, or Xm, the first one that on 
observation is the second smallest to have been seen—tends to be larger? 
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Solution: To calculate the probability density function of Xj, it is natural to 
condition on the value of N; so let us start by determining its probability mass 
function. Now, if we let 


Aj = {X; # second largest of X1,..., Xi}, i> 2 
then, for 7 > 2, 
P{N = n} = P(A2A3 oi -An-1A%) 


Since the X; are independent and identically distributed it follows that, for 
any m>1, knowing the rank ordering of the variables X1,..., Xm yields no 
information about the set of m values {X1,...,Xm}. That is, for instance, 
knowing that X; < X2 gives us no information about the values of min(X1, X2) 
or max(X1, X2). It follows from this that the events Aj, i > 2 are independent. 
Also, since X; is equally likely to be the largest, or the second largest, ..., or 
the ith largest of X1,..., X; it follows that P{A;} = (¢-1)/i, 1 > 2. Therefore, 
we see that 


1 23> 21. 230 
234 n—-1n n(n—-1) 


Hence, conditioning on N yields that the probability density function of Xn is 
as follows: 


ee) 


1 
fxn) = >> aa wy PEnIN (I) 
ne 


Now, since the ordering of the variables X1,..., X,, is independent of the set 
of values {X1,..., Xn}, it follows that the event {N =n} is independent of 
{X1,..., Xn}. From this, it follows that the conditional distribution of Xj given 
that N =7 is equal to the distribution of the second largest from a set of 1 ran- 
dom variables having distribution F. Thus, using the results of Example 2.38 
concerning the density function of such a random variable, we obtain 


ee) 


1 ! 
iv) = DO ay GT FO” FOG — Fe) 
2 iT! 


= f(x)(1 — F(x) >) (F(x)! 
i=0 


=f) 


Thus, rather surprisingly, Xj has the same distribution as X1, namely, F. Also, 
if we now let W; = —X;, i > 1, then Wy will be the value of the first W;, which 
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on observation is the second largest of all those that have been seen. Hence, 
by the preceding, it follows that Wy has the same distribution as W. That is, 
—Xvy has the same distribution as —X 1, and so Xy also has distribution F! In 
other words, whether we stop at the first random variable that is the second 
largest of all those presently observed, or we stop at the first one that is the 
second smallest of all those presently observed, we will end up with a random 
variable having distribution F. 

Whereas the preceding result is quite surprising, it is a special case of a 
general result known as Ignatov’s theorem, which yields even more surprises. 
For instance, for k > 1, let 


Np = min{a > k: X;, = kth largest of X1,..., Xp} 


Therefore, N2 is what we previously called N, and Xn, is the first random 
variable that upon observation is the kth largest of all those observed up to 
this point. It can then be shown by a similar argument as used in the preceding 
that Xn, has distribution function F for all k (see Exercise 82 at the end of this 
chapter). In addition, it can be shown that the random variables Xn,, k > 1 
are independent. (A statement and proof of Ignatov’s theorem in the case of 
discrete random variables are given in Section 3.6.6.) a 


Example 3.30 A population consists of m families. Let X; denote the size of 
family j, and suppose that X1,..., X, are independent random variables having 
the common probability mass function 


p(k) = P(X =k), Do peal 
k=1 


with mean 4p = X,kp,. Suppose a member of the population is randomly chosen, 
in that the selection is equally likely to be any of the members of the population, 
and let S; be the event that the selected individual is from a family of size i. Argue 
that 

pi 


P(S;) ~ —as m—>oo 
bw 


Solution: A heuristic argument for the preceding formula is that because each 
family is of size i with probability p;, it follows that there are approximately 
mp; families of size i when m is large. Thus, imp; members of the population 


come from a family of size i, implying that the probability that the selected 
imp; _ ipj 
iM; ~~ tad : 
For a more formal argument, let N; denote the number of families that are 
of size i. That is, 


individual is from a family of size i is approximately 


N; = number {k:k=1,...,m: X, =i} 
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Then, conditional on X = (X1,..., Xm) 


iN; 
Pe 
wale) pee. © 


Hence, 


P(S;) = E[P(S;|X)] 


fo] 
Deni Xk 


iN;/m 
— E ee 
Ss, X,/m 


Because each family is independently of size i with probability p;, it follows by 
the strong law of large numbers that N;/m, the fraction of families that are of 
size i, converges to pj as m — oo. Also by the strong law of large numbers, 
pe Xp/m > E[X] = w as m > oo. Consequently, with probability 1, 


iN;/m : 1pi 
aa Xp/m ML 


as MmM—> ©} 


Because the random variable converges to Pe so does its expecta- 


iN; 
; hat Xe 
tion, which proves the result. (While it is now always the case that limj,+00 
Ym = c implies that limo ELY] = c, the implication is true when the Y,, 


are uniformly bounded random variables, and the random variables xy 
k=l 
are all between 0 and 1.) | 


The use of conditioning can also result in a more computationally efficient 
solution than a direct calculation. This is illustrated by our next example. 


Example 3.31 Consider 1 independent trials in which each trial results in one 
of the outcomes 1,..., with respective probabilities p1,...,p,%, )-j~1 pi = 1. 
Suppose further that n > k, and that we are interested in determining the prob- 
ability that each outcome occurs at least once. If we let A; denote the event that 
outcome i does not occur in any of the 7 trials, then the desired probability is 
1— PUL, Aj), and it can be obtained by using the inclusion-exclusion theorem 
as follows: 


k k 
P| JA} =>, P@D— >> PGA) 
aA =I eee 


i j>i k>yj 
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where 
P(A;j) = (1 — pi)” 
P(A;Aj) = (1—- pi- pj)", i<j 
P(A;AjAg) = 1 — pi — pj — Py)”, i<f<k 


The difficulty with the preceding solution is that its computation requires the 
calculation of 2* — 1 terms, each of which is a quantity raised to the power n. 
The preceding solution is thus computationally inefficient when k is large. Let us 
now see how to make use of conditioning to obtain an efficient solution. 

To begin, note that if we start by conditioning on Nz (the number of times 
that outcome k occurs) then when Nz > 0 the resulting conditional probability 
will equal the probability that all of the outcomes 1,...,—1 occur at least once 
when  — N; trials are performed, and each results in outcome i with probability 
Pili Oi i= 1,...,k—1. We could then use a similar conditioning step on 
these terms. 

To follow through on the preceding idea, let Ay,,,, for m <n, r < k, denote the 
event that each of the outcomes 1,...,7 occurs at least once when m independent 
trials are performed, where each trial results in one of the outcomes 1,...,7 
with respective probabilities p1/P,,...,pr/P;, where P; = )°7_1p;. Let P(m,r) = 
P(A) and note that P(n, k) is the desired probability. To obtain an expression 
for P(m,r), condition on the number of times that outcome r occurs. This gives 


P(m,r) = S PlAnslt occurs j cimes}(”") (=) (: = pr 


j=0 j 
m—r+1 os p j p m-j 
= Y vem—r0(7) (5) (2-5) 


Starting with 


P(m1)=1, ifm>1 
P(m,1)=0, ifm=0 


we can use the preceding recursion to obtain the quantities P(m,2), m = 
2,...,2 — (k — 2), and then the quantities P(m,3), m = 3,...,n — (k — 3), 
and so on, up to Pim, k—1),m=k-—1,...,2—1. At this point we can then use 
the recursion to compute P(n, k). It is not difficult to check that the amount of 
computation needed is a polynomial function of k, which will be much smaller 
than 2% when k is large. a 


As noted previously, conditional expectations given that Y = y are exactly 
the same as ordinary expectations except that all probabilities are computed 
conditional on the event that Y = y. As such, conditional expectations satisfy all 
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the properties of ordinary expectations. For instance, the analog of 


a E[X|W = w]P{W =w},_ if W is discrete 


E[X] = 
/ E[X|W = wlfw(w)dw, if W is continuous 
is 
E[LX|Y = yl 
S> ELX|W =w,Y =y|P{W =w|Y =y}, if W is discrete 
ul E[X|W = w, Y = yifwiy(wly) dw, if W is continuous 


If ELX|Y, W] is defined to be that function of Y and W that, when Y = y and 
W = w, is equal to E[X|Y = y, W = w), then the preceding can be written as 


E[X|Y] = E[ELX|Y, W]|Y] 


Example 3.32 An automobile insurance company classifies each of its policy- 
holders as being of one of the types i = 1,...,&. It supposes that the numbers 
of accidents that a type i policyholder has in successive years are independent 
Poisson random variables with mean j,i = 1,...,&. The probability that a 
newly insured policyholder is type i is p;, ean p; = 1. Given that a policyholder 
had 7 accidents in her first year, what is the expected number that she has in her 
second year? What is the conditional probability that she has m accidents in her 
second year? 


Solution: Let N; denote the number of accidents the policyholder has in year 
i,i = 1,2. To obtain E[N2|N1 = n], condition on her risk type T. 


Me 


E(N2|Ni = 7] = )_ E[Na|T =7,Ni = n)P{T = j|N1 =n} 


T 
a 


Il 
Mer 


E(N2|T = j|P{T = j|N1 =n} 


~. 
Il 
an 


II 
Mr 


AjJP{T = j|N1 =n} 
=1 


k —Ajyntl,. 
~ Date de Pj 


ce 
-1e MD; 


i 
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where the final equality used that 

P{T = I Ni =n} 
P{N, = n} 

= PIN: = nT = j}P{T = 7} 
Dh PIN = alT = \P(T = A} 


P{T =j|Ni =n} = 


pie iA? /n! 


k ae 
vi=1 pje “10 [nl 


The conditional probability that the policyholder has m accidents in year 2 
given that she had vin year 1 can also be obtained by conditioning on her type. 


k 
P{N2 = m|Ni =n} =) P(N) = m|T =j,N = n}P{T = j|Ni =n} 
j=l 
k m 
= Se P{T =j|N1 =n} 
m! 
j=l 


k —2);,m-+n,,. 
aie 1K; Dj 


k —hj 


Another way to calculate P{N2 = m|N, = n} is first to write 


P{N>2 =m, Ni = n} 


P{N2 = m|N4 = n} = PIN} = ny 


and then determine both the numerator and denominator by conditioning on T. 
This yields 
Dis PIN2 = m,Ny = alT = jp; 
i 

hy PIN: = alT = fp; 

ae eee 
bee” ge ae 
= =e 

jae aD; 
k 99. 


> ee : 


P{N>2 = m|N4 = n} = 
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3.6 Some Applications 


3.6.1 A List Model 


Consider n elements—e1, e2, ...,€,—that are initially arranged in some ordered 
list. At each unit of time a request is made for one of these elements—e; being 
requested, independently of the past, with probability P;. After being requested 
the element is then moved to the front of the list. That is, for instance, if the 
present ordering is e1, e2, €3, e4 and e3 is requested, then the next ordering is 
€35 C1, €2, 4. 

Weare interested in determining the expected position of the element requested 
after this process has been in operation for a long time. However, before com- 
puting this expectation, let us note two possible applications of this model. In the 
first we have a stack of reference books. At each unit of time a book is randomly 
selected and is then returned to the top of the stack. In the second application we 
have a computer receiving requests for elements stored in its memory. The request 
probabilities for the elements may not be known, so to reduce the average time it 
takes the computer to locate the element requested (which is proportional to the 
position of the requested element if the computer locates the element by starting 
at the beginning and then going down the list), the computer is programmed to 
replace the requested element at the beginning of the list. 

To compute the expected position of the element requested, we start by con- 
ditioning on which element is selected. This yields 


E[position of element requested ] 


n 
= 2 E[position|e; is selected ]P; 
i=1 


n 
= + E[ position of e;|e; is selected ]P; 
i=1 


n 
=) E[position of e; }P; (3.16) 
i=1 
where the final equality used that the position of e; and the event that e; is selected 
are independent because, regardless of its position, e; is selected with probabil- 
ity Pj. 
Now, 
position of ep =1+ a jj 
[Fi 
where 
ae {\ if e; precedes e; 


'~ 10, otherwise 
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and so, 


E[ position of e;] = 1+ ¥ Ej] 
i#i 
=1+ ye P{e; precedes e;} (3.17) 
i#i 
To compute P{e; precedes e;}, note that e; will precede e; if the most recent request 


for either of them was for e;. But given that a request is for either e; or e;, the 
probability that it is for e; is 


Ean eee 
ejlej or ej} = P+ P; 
and, thus, 
P; 
P{e; precedes e;} = Pp = Pp 
i j 


Hence, from Equations (3.16) and (3.17) we see that 


Pj 
Pi +P; 


n 
E{position of element requested} = 1 + bs. P; ye 
i=l i 


This list model will be further analyzed in Section 4.8, where we will assume a 
different reordering rule—namely, that the element requested is moved one closer 
to the front of the list as opposed to being moved to the front of the list as assumed 
here. We will show there that the average position of the requested element is 
less under the one-closer rule than it is under the front-of-the-line rule. 


3.6.2 A Random Graph 


A graph consists of a set V of elements called nodes and a set A of pairs of 
elements of V called arcs. A graph can be represented graphically by drawing 
circles for nodes and drawing lines between nodes i and j whenever (,/) is an 
arc. For instance if V = {1,2,3,4} and A = {(1, 2), (1,4), (2, 3), (1, 2), (3, 3)}, 
then we can represent this graph as shown in Figure 3.1. Note that the arcs have 
no direction (a graph in which the arcs are ordered pairs of nodes is called a 
directed graph); and that in the figure there are multiple arcs connecting nodes 1 
and 2, and a self-arc (called a self-loop) from node 3 to itself. 

We say that there exists a path from node i to node j, i # /, if there exists 
a sequence of nodes i, i1,...,i%,7 such that (i, 11), (41, i2),..., (ig, /) are all arcs. 
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Figure 3.1. A graph. 


Vil 


Figure 3.2 A disconnected graph. 


If there is a path between each of the (5) distinct pair of nodes we say that 
the graph is connected. The graph in Figure 3.1 is connected but the graph in 
Figure 3.2 is not. Consider now the following graph where V = {1,2,...,m} and 
A = {(i,X(i)), i = 1,...,”} where the X(i) are independent random variables 
such that 


In other words from each node i we select at random one of the 7 nodes (including 
possibly the node i itself) and then join node i and the selected node with an arc. 
Such a graph is commonly referred to as a random graph. 

We are interested in determining the probability that the random graph so 
obtained is connected. As a prelude, starting at some node—say, node 1—let us 
follow the sequence of nodes, 1, X(1), X7(1),..., where X”(1) = X(X”~1(1)); 
and define N to equal the first k such that X*(1) is not a new node. In other 
words, 


N = 1st k such that X*(1) € {1, X(1),..., X*7!(1)} 


We can represent this as shown in Figure 3.3 where the arc from XN~!(1) goes 
back to a node previously visited. 

To obtain the probability that the graph is connected we first condition on N 
to obtain 


P{graph is connected} = a P{connected|N = k}P{N = k} (3.18) 
k=1 
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Figure 3.3 


Now, given that N = k, the k nodes 1, X(1),..., X*~!(1) are connected to each 
other, and there are no other arcs emanating out of these nodes. In other words, 
if we regard these k nodes as being one supernode, the situation is the same as if 
we had one supernode and v — k ordinary nodes with arcs emanating from the 
ordinary nodes—each arc going into the supernode with probability k/n. The 
solution in this situation is obtained from Lemma 3.2 by taking r =n — k. 


Lemma 3.2 Given a random graph consisting of nodes 0,1,...,7 and r arcs— 
namely, (i, Y;),i=1,...,7, where 


1 
j with probability ——, j=1.,...,r 


Y; = 4 i k? 
0 with probability mea 
then 
P{graph is connected} = se 
r+k 


(In other words, for the preceding graph there are r + 1 nodes—r ordinary 
nodes and one supernode. Out of each ordinary node an arc is chosen. The arc 
goes to the supernode with probability k/(r + k) and to each of the ordinary ones 
with probability 1/(r + k). There is no arc emanating out of the supernode.) 


Proof. The proof is by induction on r. As it is true when r = 1 for any k, assume 
it true for all values less than 7. Now, in the case under consideration, let us first 
condition on the number of arcs (j, Y;) for which Y; = 0. This yields 


P{connected} 


r - k i r r-1 
= 2 P{connected|i of the Yj = (5) (=) (=) (3.19) 


Now, given that exactly i of the arcs are into the supernode (see Figure 3.4), the 
situation for the remaining r — i arcs which do not go into the supernode is the 
same as if we had r — i ordinary nodes and one supernode with an arc going 
out of each of the ordinary nodes—into the supernode with probability i/r and 
into each ordinary node with probability 1/r. But by the induction hypothesis 
the probability that this would lead to a connected graph is i/r. 
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oO 


(8) 


Figure 3.4 The situation given that i of the r arcs are into the supernode. 


Hence, 
P{connected|i of the Y; = 0} = - 


and from Equation (3.19) 


P{connected} -(") (= z) 


i=0 


1G binomial @ 
; 


2 
+ 
~~ 
| 


a 
r+k 


which completes the proof of the lemma. a 


Hence, as the situation given N=k is exactly as described by Lemma 3.2 when 
r=n-—k, we see that, for the original graph, 


k 
P{graph is connected|N = k} = ; 
and, from Equation (3.18), 
E(N) 
n 


P{graph is connected} = (3.20) 


To compute E(N) we use the identity 


E(N) = ye PIN >i} 
i=1 
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which can be proved by defining indicator variables I;, i > 1, by 


1, ifi<N 


I= 
‘10, ifi>N 


=)) PIN> 3 (3.21) 


Now, the event {N > i} occurs if the nodes 1, X(1),..., X’~!(1) are all distinct. 
Hence, 


(n— 1) (n—2) (n—i+1) 
(n—1)! 
(n — i)\ni-1 


PIN>i}= 


and so, from Equations (3.20) and (3.21), 


: 1 
P{graph is connected} = (7 — 1)! ye re 
=A (n—1)!n 
(n—1)! nl 
=F byjan-i (3.22) 
n” fo y! 


We can also use Equation (3.22) to obtain a simple approximate expression 
for the probability that the graph is connected when 7 is large. To do so, we first 
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note that if X is a Poisson random variable with mean n, then 


ae 
_ on 
P{X <n}=e LaF 
j= 


Since a Poisson random variable with mean 7 can be regarded as being the sum 
of 2 independent Poisson random variables each with mean 1, it follows from the 
central limit theorem that for 7 large such a random variable has approximately 
a normal distribution and as such has probability 5 of being less than its mean. 
That is, for large, 


P{X <n}~ 5 


and so for 7 large, 


Hence, from Equation (3.22), for large, 


e”"(n — 1)! 


P{graph is connected} © wae 


By employing an approximation due to Stirling that states that for n large, 


nix nen /In 


we see that, for 1 large, 


; a n—1\" 
P{graph is connected} * =) e 


and as 


—1)\" 1\" 
lim (7 ) = lim (1-2) ag 
n—> Oo n noo n 


we see that, for 1 large, 


P{graph is connected} * = 
a 


Now a graph is said to consist of r connected components if its nodes can be 
partitioned into r subsets so that each of the subsets is connected and there are 
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© © 
Figure 3.5 A graph having three connected components. 
no arcs between nodes in different subsets. For instance, the graph in Figure 3.5 


consists of three connected components—namely, {1, 2, 3}, {4, 5}, and {6}. Let C 
denote the number of connected components of our random graph and let 


Pr) = P{C = 3} 


where we use the notation P,,(i) to make explicit the dependence on n, the number 
of nodes. Since a connected graph is by definition a graph consisting of exactly 
one component, from Equation (3.22) we have 


P,(1) = P{C =1} 


= = (3.23) 


To obtain P,,(2), the probability of exactly two components, let us first fix atten- 
tion on some particular node—say, node 1. In order that a given set of k — 1 
other nodes—say, nodes 2,...,k—will along with node 1 constitute one con- 
nected component, and the remaining m — k a second connected component, we 
must have 


(i) X(@ €{1,2,...,k}, for alli =1,...,2. 

(ii) X@ €{k+1,...,7}, foralli=k4+1,...,7. 

(iii) The nodes 1,2,...,% form a connected subgraph. 
(iv) The nodes k + 1,...,” forma connected subgraph. 


The probability of the preceding occurring is clearly 


k k _k n—k 
(=) (“ ) Py(1)P,,-4(1) 
nN nN 


and because there are (5) ways of choosing a set of k — 1 nodes from the nodes 


2 through n, we have 


n—1 k n—k 
~1\(k\* (n—k 
P,(2) = > Ge i) (5) (7 - ) Pe(L)Py-& CL) 


k=1 
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Figure 3.6 A cycle. 


and so P,,(2) can be computed from Equation (3.23). In general, the recursive 
formula for P,,(i) is given by 


n—-i+1 k n—k 
—1)\(k —k 
Pili) = So (2-.)() (— ) Py (1)P 4 (i — 1) 


k=1 


To compute E[C], the expected number of connected components, first note that 
every connected component of our random graph must contain exactly one cycle 
(a cycle is a set of arcs of the form (i, 1), (41, 2),.--5 (ip_15ip)5 (ig, i) for distinct 
nodes i, i1,..., 4%). For example, Figure 3.6 depicts a cycle. 

The fact that every connected component of our random graph must contain 
exactly one cycle is most easily proved by noting that if the connected component 
consists of r nodes, then it must also have r arcs and, hence, must contain exactly 
one cycle (why?). Thus, we see that 


E[C] = E[number of cycles] 


E Y18)| 
AY 
= CEL E[I(S)] 
S 


where the sum is over all subsets S Cc {1,2,...,} and 


1, if the nodes in S are all the nodes of a cycle 
I(S) = : 
0, otherwise 


Now, if S consists of k nodes, say 1,...,k, then 
E{I(S)] = P(1, X(1),..., X*7!(1) are all distinct and contained in 
_,kand X*(1) = 1} 
bTE=9. A 


n n nn nk 
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Hence, because there are (7) subsets of size k we see that 


= k—1)! 
Bel = > (4) ) 


k 
fA " 


3.6.3 Uniform Priors, Polya’s Urn Model, and Bose-Einstein Statistics 


Suppose that 7 independent trials, each of which is a success with probability p, 
are performed. If we let X denote the total number of successes, then X is a 
binomial random variable such that 


P{X = kip} = CG) pep) *, k=0,1,...,” 


However, let us now suppose that whereas the trials all have the same success 
probability p, its value is not predetermined but is chosen according to a uniform 
distribution on (0, 1). (For instance, a coin may be chosen at random froma huge 
bin of coins representing a uniform spread over all possible values of p, the coin’s 
probability of coming up heads. The chosen coin is then flipped 7 times.) In this 
case, by conditioning on the actual value of p, we have 


1 
PIX =k} = i P{X = Rip}f(p) dp 


‘in k n—k 
= ({)eta-p dp 
0 


Now, it can be shown that 


. ki(n — k)! 
koa _ yyn—-k = 
[ pth =p)"" dp ee (3.24) 
and thus 
_ a, (2\k1G— 28)! 
tee Ca 
= : k=0,1 3.25 
“A+? =Us1,...5,0 ( : ) 


In other words, each of the n + 1 possible values of X is equally likely. 
As an alternate way of describing the preceding experiment, let us compute the 
conditional probability that the (r + 1)st trial will result in a success given a total 
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of k successes (and r — k failures) in the first r trials. 


P{(r + 1)st trial is a success|k successes in first 7} 


P{(r + 1)st is a success, k successes in first 7 trials} 


P{k successes in first 7 trials} 


_ de P{(r + 1)st is a success, k in first r|p} dp 
7 1/(r + 1) 


1 
=(r+ vf (, ea — p)'* dp 
0 


_ r\ (Rk + 1)!(r—k)! ; 
=(r+ n(;) Ge oi by Equation (3.24) 


_ ere 
a) 


(3.26) 


That is, if the first r trials result in k successes, then the next trial will be a success 
with probability (k + 1)/(r + 2). 

It follows from Equation (3.26) that an alternative description of the stochastic 
process of the successive outcomes of the trials can be described as follows: There 
is an urn that initially contains one white and one black ball. At each stage a ball 
is randomly drawn and is then replaced along with another ball of the same color. 
Thus, for instance, if of the first r balls drawn, k were white, then the urn at the 
time of the (r + 1)th draw would consist of k + 1 white and r—k + 1 black, and 
thus the next ball would be white with probability (k + 1)/(r + 2). If we identify 
the drawing of a white ball with a successful trial, then we see that this yields an 
alternate description of the original model. This latter urn model is called Polya’s 
urn model. 


Remarks 


(i) In the special case when k = r, Equation (3.26) is sometimes called Laplace’s rule 
of succession, after the French mathematician Pierre de Laplace. In Laplace’s era, 
this “rule” provoked much controversy, for people attempted to employ it in diverse 
situations where its validity was questionable. For instance, it was used to justify 
such propositions as “If you have dined twice at a restaurant and both meals were 
good, then the next meal also will be good with probability 7,” and “Since the 
sun has risen the past 1,826,213 days, so will it rise tomorrow with probability 
1,826,214/1,826,215.” The trouble with such claims resides in the fact that it is 
not at all clear the situation they are describing can be modeled as consisting of 
independent trials having a common probability of success that is itself uniformly 
chosen. 

(ii) In the original description of the experiment, we referred to the successive trials as 
being independent, and in fact they are independent when the success probability is 
known. However, when p is regarded as a random variable, the successive outcomes 
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are no longer independent because knowing whether an outcome is a success or not 
gives us some information about p, which in turn yields information about the other 
outcomes. 


The preceding can be generalized to situations in which each trial has more 
than two possible outcomes. Suppose that 7 independent trials, each resulting 
in one of m possible outcomes 1,...,7, with respective probabilities py, ..., Pm 
are performed. If we let X; denote the number of type i outcomes that result in 
the trials, i = 1,...,m, then the vector X1,..., Xm will have the multinomial 
distribution given by 


n! 
P{X, = x1, X2 = x2,..., Xm = Xm|p} = ——— Pip." * sly 
xy!+ + Xp! 


m 
where x1,...,Xm is any vector of nonnegative integers that sum to 2. Now let us 
suppose that the vector p = (f1,..., Pm) is not specified, but instead is chosen by 
a “uniform” distribution. Such a distribution would be of the form 


C5 O0<pi <1,i=1,...,m, Die ree 
0, otherwise 


flDis---sPm) = | 


The preceding multivariate distribution is a special case of what is known as 
the Dirichlet distribution, and it is not difficult to show, using the fact that the 
distribution must integrate to 1, that c = (m— 1)!. 

The unconditional distribution of the vector X is given by 


PIX =a. Xmaamd= ff f PX = %1,...,Xm =XmlP1,--->Pm} 


(m — 1)!n! ¥ se 
x 0<pj<1 
LT ej=1 


Now it can be shown that 


e om 2 x11++ + Xm! 
[[ fer ean (07 x; + m—1)! oak, 


0<pj<I 
LT p=1 


and thus, using the fact that }°7’x; = 1, we see that 


n\(m — 1)! 
(n+m-—1)! 
= 
2 (" +m — ’) (3.28) 


m—1 


P{X, HX eatin} = 
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Hence, all of the (as ') possible outcomes (there are om *) possible 
nonnegative integer lied solutions of x1 + --- + Xm = n) af the vector 


(X1,...;Xm) are equally likely. The probability distribution given by Equa- 
tion (3.28) is sometimes called the Bose-Einstein distribution. 

To obtain an alternative description of the foregoing, let us compute the con- 
ditional probability that the (7 + 1)st outcome is of type if the first 7 trials have 
resulted in x; type i outcomes, i = 1,...,m, )~7'xj =n. This is given by 


P{(m + 1)st is j|x; type iin first m, i=1,...,m} 


_ P{(a + 1)st is j, x; type iin first nm, i= 1,...,m} 


P{x; type iin first n, i= 1,...,m} 


on _ + 1 
ff foi pi pedpr-++dim 
n+m—1\7! 
m—1 
where the numerator is obtained by conditioning on the p vector and the denom- 
inator is obtained by using Equation (3.28). By Equation (3.27), we have 


P{(m + 1)st is j|x; type iin first m, i=1,...,m} 


(xj + 1)n!(m — 1)! 
(n+ my)! 
(m — 1)!n! 
(n+ m-—1)! 
xjt1 
= n+m 


(3.29) 


Using Equation (3.29), we can now present an urn model description of the 
stochastic process of successive outcomes. Namely, consider an urn that initially 
contains one of each of m types of balls. Balls are then randomly drawn and are 
replaced along with another of the same type. Hence, if in the first 7 drawings 
there have been a total of x; type j balls drawn, then the urn immediately before the 
(n + 1)st draw will contain x; + 1 type j balls out of a total of m + n, and so the 
probability of a type j on the (7 + 1)st draw will be given by Equation (3.29). 


Remark Consider a situation where 1 particles are to be distributed at random 
among m possible regions; and suppose that the regions appear, at least before 
the experiment, to have the same physical characteristics. It would thus seem that 
the most likely distribution for the number of particles that fall into each of the 
regions is the multinomial distribution with p; = 1/m. (This, of course, would 
correspond to each particle, independent of the others, being equally likely to fall 
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in any of the m regions.) Physicists studying how particles distribute themselves 
observed the behavior of such particles as photons and atoms containing an 
even number of elementary particles. However, when they studied the resulting 
data, they were amazed to discover that the observed frequencies did not fol- 
low the multinomial distribution but rather seemed to follow the Bose-Einstein 
distribution. They were amazed because they could not imagine a physical model 
for the distribution of particles that would result in all possible outcomes being 
equally likely. (For instance, if 10 particles are to distribute themselves between 
two regions, it hardly seems reasonable that it is just as likely that both regions 
will contain 5 particles as it is that all 10 will fall in region 1 or that all 10 will 
fall in region 2.) 

However, from the results of this section we now have a better understanding 
of the cause of the physicists’ dilemma. In fact, two possible hypotheses present 
themselves. First, it may be that the data gathered by the physicists were actually 
obtained under a variety of different situations, each having its own characteristic 
p vector that gave rise to a uniform spread over all possible p vectors. A second 
possibility (suggested by the urn model interpretation) is that the particles select 
their regions sequentially and a given particle’s probability of falling in a region is 
roughly proportional to the fraction of the landed particles that are in that region. 
(In other words, the particles presently in a region provide an “attractive” force 
on elements that have not yet landed.) 


3.6.4 Mean Time for Patterns 


Let X = (X1, X2,...) be a sequence of independent and identically distributed 
discrete random variables such that 


pi = P{Xj = 7} 


For a given subsequence, or pattern, i1,...,in let T = T(i1,...,in) denote the 
number of random variables that we need to observe until the pattern appears. 
For instance, if the subsequence of interest is 3,5,1 and the sequence is X = 
(5,3,1,3,5,3,5,1,6,2,...) then T = 8. We want to determine E[T]. 

To begin, let us consider whether the pattern has an overlap, where we say that 
the pattern i1,i2,...,i, has an overlap if for some k,1 < k < n, the sequence 
of its final k elements is the same as that of its first k elements. That is, it has an 
overlap if forsome 1 < k <n, 


(in—kt1s+++94n) = (11, -++ 54k), k<n 


For instance, the pattern 3, 5, 1 has no overlaps, whereas the pattern 3, 3, 3 does. 


Case 1. The pattern has no overlaps. 
In this case we will argue that T will equal j + 7 if and only if the pattern 
does not occur within the first 7 values, and the next n values are ij,..., in. 
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That is, 
T=jtne{T > 7, Xj1,---5Xj4n) = 1,---5tn)} (3.30) 


To verify (3.30), note first that T = j + clearly implies both that T > j and 
that (Xj41,..-,Xj4n) = (i1,---5 in). On the other hand, suppose that 


T>j and (Xj415--+5Xj+n) = (11,..-5%n) (3.31) 


Let k < n. Because (i1,..., ig) A (in_ky15-+-54n), it follows that T 4 j + k. But 
(3.31) implies that T < j + 1, so we can conclude that T = / + 1. Thus we have 
verified (3.30). 

Using (3.30), we see that 


P{T=j+n}=P{T >4, (Xjui,---,Xjtn) = (H,---5¢n)} 


However, whether T > j is determined by the values X1,..., Xj, and is thus inde- 
pendent of Xj41,..., Xjin. Consequently, 


P(T =f +n} = P(T > fHP{(Xjut,..-,Xj4n) = (ity ---sin)} 
= P{T > j\p 


where 
P= PiPir + Di, 


Summing both sides of the preceding over all j yields 


1=)°P{T=j+n}=p) PIT > j} =pEIT] 
j=0 j=0 


or 


Case 2. The pattern has overlaps. 

For patterns having overlaps there is a simple trick that will enable us to obtain 
E[T] by making use of the result for nonoverlapping patterns. To make the anal- 
ysis more transparent, consider a specific pattern, say P = (3,5,1,3,5). Let x 
be a value that does not appear in the pattern, and let T,, denote the time until 
the pattern P,, = (3,5, 1,3,5,x) appears. That is, T,, is the time of occurrence of 
the new pattern that puts x at the end of the original pattern. Because x did not 
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appear in the original pattern it follows that the new pattern has no overlaps; 
thus, 


where p = [7-1 pi = p3p%p1. Because the new pattern can occur only after the 
original one, write 


Tx =T+A 


where T is the time at which the pattern P = (3, 5,1, 3,5) occurs, and A is the 
additional time after the occurrence of the pattern P until P, occurs. Also, let 
E[Tx|i1, ...i-] denote the expected additional time after time r until the pattern 
P,, appears given that the first r data values are i1,...,i,. Conditioning on X, the 
next data value after the occurrence of the pattern (3, 5, 1, 3, 5), gives 


1+ E[Ty|3,5,1], if i=] 


44 ELT 13], Fi=3 
ELA ifi=x 
14 ETA: if i#1,3,x 
Therefore, 


E[T,] = E[T]+ EIA] 
= E[T]+1+ E[T|3, 5, 1]p1 + ELTx|3]p3 + E[Tx] — pi — p3 — px) 
(3.32) 


But 

E{T,] = E[T@, 5, 1)] + E[T|3, 5, 1] 
giving 

E[Tx|3, 5,1) = E[T,.] — E[T@G, 5, 1)] 
Similarly, 

E[Tx|3] = E[Tx] — E[T(3)] 
Substituting back into Equation (3.32) gives 


pxE(Tx] = E[T] + 1 — pi E[T@G, 5, 1)] — p3E[T(3)] 
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But, by the result in the nonoverlapping case, 


1 1 
E[T@, 5, 1)] = , ETR))=— 
oe " P3PsP1 on D3 


yielding the result 


1 1 1 
+ =—+ 
P3Ps) PS oip3ps 


For another illustration of the technique, let us reconsider Example 3.15, which 
is concerned with finding the expected time until 7 consecutive successes occur 
in independent Bernoulli trials. That is, we want E[T], when the pattern is P = 
(1,1,...,1). Then, with x 4 1 we consider the nonoverlapping pattern P, = 
(1,...,1,x), and let T,, be its occurrence time. With A and X as previously defined, 
we have 


1+EfA], if i=l 
E[A|X = i] = 31, if i=x 
1+ElTy], if i#1,x 


Therefore, 
E[A] = 1+ E[A]pi + E{Tx]( — pi — px) 


or 


E[A] 


~ 1-pi 1-pi 
Consequently, 


E(T] = E[Tx] — E[A] 
7 PxE[Tx] —1 
1 pi 
_ py" =1 
1-pi 
where the final equality used that E[T,,] = as" 
The mean occurrence time of any overlapping pattern P = (i,...,i,) can 
be obtained by the preceding method. Namely, let T,, be the time until the 


3.6 Some Applications 157 


nonoverlapping pattern Py = (i1,...,i,,x) occurs; then use the identity 
E[T,] = E[T] + E[A] 

to relate E[T] and E[T,] = rE then condition on the next data value after P 

occurs to obtain an expression for E[A] in terms of quantities of the form 


E(Tx|i1, tee sty] = E{ Tx] = E(T (1, tee »t)] 


If (i1,...,4,) is nonoverlapping, use the nonoverlapping result to obtain 
E[T (i, ...,%,)]; otherwise, repeat the process on the subpattern (i1,..., i,). 


Remark Wecan utilize the preceding technique even when the pattern i1,..., in 
includes all the distinct data values. For instance, in coin tossing the pattern of 
interest might be h, t, h. Even in such cases, we should let x be a data value that 
is not in the pattern and use the preceding technique (even though p, = 0). 
Because p,, will appear only in the final answer in the expression p,E[T,] = ee 
by interpreting this fraction as 1/p we obtain the correct answer. (A rigorous 
approach, yielding the same result, would be to reduce one of the positive p; by 
e, take px = €, solve for E[T], and then let € go to 0.) a 


3.6.5 The k-Record Values of Discrete Random Variables 


Let X1, X2,... be independent and identically distributed random variables 
whose set of possible values is the positive integers, and let P{X = j},/ > 1, denote 
their common probability mass function. Suppose that these random variables 
are observed in sequence, and say that X,, is a k-record value if 


X; > Xn for exactly k of the values 1, i= 1,...,7 


That is, the mth value in the sequence is a k-record value if exactly k of the first n 
values (including X,,) are at least as large as it. Let Rg denote the ordered set of 
k-record values. 

It is a rather surprising result that not only do the sequences of k-record val- 
ues have the same probability distributions for all k, these sequences are also 
independent of each other. This result is known as Ignatov’s theorem. 


Theorem 3.1 (Ignatov’s Theorem) R,, & > 1, are independent and identically 
distributed random vectors. 


Proof. Define a series of subsequences of the data sequence X1, X2,... by letting 
the ith subsequence consist of all data values that are at least as large as i, i > 1. 
For instance, if the data sequence is 


2,5,1,6, 9, 8, 3,4, 1,5, 7, 8,2, 1,3,4,2,5,6,1,... 
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then the subsequences are as follows: 
2,5, 1,6, 9, 8, 3,4, 1,5, 7, 8,2, 1,3,4,2,5,6,1,... 


1: 
2: 2,5,6,9,8,3,4,5, 7, 8,2, 3,4,2,5,6,... 
3: 5,6,9,8,3,4, 5,7, 8,3,4,5,6,... 


V V WV 


and so on. 

Let xi be the jth element of subsequence i. That is, X; is the jth data value 
that is at least as large as 7. An important observation is that 7 is a k-record value 
if and only if Xi, = i. That is, i will be a k-record value if and only if the kth 
value to be at least as large as i is equal to i. (For instance, for the preceding data, 
since the fifth value to be at least as large as 3 is equal to 3 it follows that 3 is a 
five-record value.) Now, it is not difficult to see that, independent of which values 
in the first subsequence are equal to 1, the values in the second subsequence are 
independent and identically distributed according to the mass function 


P{value in second subsequence = j} = P{X =j|X > 2}, j>2 


Similarly, independent of which values in the first subsequence are equal to 1 and 
which values in the second subsequence are equal to 2, the values in the third 
subsequence are independent and identically distributed according to the mass 
function 


P{value in third subsequence = j} = P{X = j|X > 3}, j>3 


and so on. It therefore follows that the events {xi = i}, i > 1,7 > 1, are 
independent and 


P{i is a k-record value} = PEG =i=P{X =i|X >a} 


It now follows from the independence of the events {Xi, = i}, i > 1, and the 
fact that P{i is a k-record value} does not depend on k, that Ry has the same 
distribution for all k > 1. In addition, it follows from the independence of the 
events {Xi = 1}, that the random vectors Ry, k > 1, are also independent. 


Suppose now that the X;, i > 1 are independent finite-valued random variables 
with probability mass function 


pi=P{X=i}, i=1,...,m 
and let 
T =min{n: X; > X, for exactly k of the values i, i= 1,...,7} 


denote the first k-record index. We will now determine its mean. 
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Proposition 3.3 Let A; = p;/ pe Dit = Nose te Den. 


m—1 


EIT] =k+(k-1) 0A 


i=1 


Proof. To begin, suppose that the observed random variables 1, X2,... take on 
one of the values i,i + 1,...,72 with respective probabilities 


P{X = 7) = —————_. 
Smee oe 


Let T; denote the first k-record index when the observed data have the preceding 
mass function, and note that since the each data value is at least 7 it follows that 
the k-record value will equal i, and T; will equal k, if X, =i. Asa result, 


E[T;|X, =i] =k 


On the other hand, if X, > i then the k-record value will exceed i, and so 
all data values equal to i can be disregarded when searching for the k-record 
value. In addition, since each data value greater than / will have probability mass 
function 


7A ne) ee 
Pita +++ + Pm 


it follows that the total number of data values greater than 7 that need be observed 
until a k-record value appears has the same distribution as Tj+1. Hence, 


E[T;|X, > i] = E[Ti41 + Ni|Xz > 1 


where T;1 is the total number of variables greater than i that we need observe to 
obtain a k-record, and N; is the number of values equal to i that are observed in 
that time. Now, given that X, > i and that Tj41 = 1 (n > k) it follows that the 
time to observe T;,1 values greater than i has the same distribution as the number 
of trials to obtain 1 successes given that trial k is a success and that each trial is 
independently a success with probability 1 — p;/ >);5; pj = 1—Ai. Thus, since the 
number of trials needed to obtain a success is a geometric random variable with 
mean 1/(1 — A,), we see that 


Tait Tat 
1-),; - 1-i; 


E(T;|Ti41, Xz >i) =1+ 


160 Conditional Probability and Conditional Expectation 


Taking expectations gives 


E(Tj41] — Ai 


Tip. — Ay 
2 1-); 


E(T,|X, > iJ) =E 
[Tj|Xp > ¢] | 


Xp > j = 
Thus, upon conditioning on whether X; = i, we obtain 


ET] = E[T;|X, = ia; + E[T;|X, > 1] — Aj) 
= (k — 1); + E[Tj41] 


Starting with E[T,,] = k, the preceding gives 


E{T 7-1] = (k = 1)Am-1 + k 
E{Tn-2] = (k = 1)Am-2 + (k = L)Am—1 +k 


m—1 
=(k—1) S> aj+k 
j=m—2 
m—-1 
E[Tn—3] = (R—-1)Am-3 + R-1) DO A +h 
j=m—2 
m—1 
=(k-1) Do atk 
j=m-3 
In general, 
m—1 
E[Tj]=(k-1) )o aj +k 
j=i 
and the result follows since T = T}. | 


3.6.6 Left Skip Free Random Walks 


Let X;,i > 1 be independent and identically distributed random variables. Let 
P; = P(X; =/) and suppose that 


[o,e) 
ed 


j=-1 


That is, the possible values of the X; are —1,0,1,.... If we take 


So=0, Sa= >) X; 
i=1 
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then the sequence of random variables S,, > 0 is called a left skip free random 
walk. (It is called left skip free because S, can decrease from S,—1 by at most 1.) 

For an application consider a gambler who makes a sequence of identical bets, 
for which he can lose at most 1 on each bet. Then if X; represents the gam- 
bler’s winnings on bet i, then S,, would represent his total winnings after the first 
n bets. 

Suppose that the gambler is playing in an unfair game, in the sense that 
E[X;] < 0, and let v = —E[X;]. Also, let To = 0, and for k > 0, let T_, denote 
the number of bets until the gambler is losing k. That is, 


T_p = min{n: S, = —k} 


It should be noted that T_, < 0x; that is, the random walk will eventually hit —k. 
This is so because, by the strong law of large numbers, S,,/n > E[X;] < 0, which 
implies that S, + —oo. We are interested in determining E[T_,] and Var(T_,). 
(It can be shown that both are finite when ELX;] < 0.) 

The key to the analysis is to note that the number of bets until one’s for- 
tune decreases by k can be expressed as the number of bets until it decreases 
by 1 (namely, T_1), plus the additional number of bets after the decrease is 
1 until the total decrease is 2 (namely, T_2 — T_1), plus the additional num- 
ber of bets after the decrease is 2 until it is 3 (namely, T_3 — T_2), and so on. 
That is, 


k 


T_p=T-1+ )) (Tj -T-¢-») 
j=2 


However, because the results of all bets are independent and identically dis- 
tributed, it follows that T_1, T-2—T_1, T-3—T_2,..., T_,—T_(g_1 are all inde- 
pendent and identically distributed. (That is, starting at any instant, the number 
of additional bets until the gambler’s fortune is one less than it is at that instant is 
independent of prior results and has the same distribution as T_1.) Consequently, 
the mean and variance of T_,, the sum of these k random variables, are 


E[T_z] = kE[T_1] 
and 
Var(T_,) = RVar(T_1) 
We now compute the mean and variance of T_; by conditioning on Xj, the 
result of the initial bet. Now, given X1, T_1 is equal to 1 plus the number of 


bets it takes until the gambler’s fortune decreases by X, + 1 from what it 
is after the initial bet. Consequently, given X 1, T_1 has the same distribution 
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as 1 + T_x,41). Hence, 


E(T_1|X1] = 1+ ElT_ajs4y] = 14+ (1 + DET_1] 
Var(T_1|X1) = Var(T_(x,41)) = (X1 + 1)Var(T_1) 


Consequently, 
E(T_4] = E[E(T_1|X1]] = 1+ (-v + 1IE[T-1] 


or 


1 
E{T_1] = A 


which shows that 
argi= (3.33) 


Similarly, with o7 = Var(X1), the conditional variance formula yields 


Var(T_1) = E[(X, + 1)Var(T_,)] + Var(X1E[T_1]) 


= (1 — v)Var(T_1) + (E[T_1])202 
o2 

= (1 —v)Var(T_1) + 5 
Vv 


thus showing that 


o2 


Var(T_1) => 73 
Vv 


and yielding the result 


2 
Var(T_4) = “2 (3.34) 


There are many interesting results about skip free random walks. For instance, 


the hitting time theorem. 


Proposition 3.4 (The Hitting Time Theorem) 


P(T_,» =n) = “PS, =-k), n>1 
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Proof. The proof is by induction on n. Now, when 7 = 1 we must prove 
PTS) = bP SS 


However, the preceding is true when k = 1 because 


Pi SSP SE eP 
and it is true when k > 1 because 


P(T_, = 1) =0= P(S, = —R), k>1 


Thus the result is true when 7 = 1. So assume that for a fixed value n > 1 and 
allk>0O 


P(T_,»p=n-1)= Psy = —k) (3.35) 


Now consider P(T_, = 1). Conditioning on Xj yields 
(oe) 
PL Lan= So Pp Sa Spe, 
j=-1 
Now, if the gambler wins ; on his initial bet, then the first time that he is down 
k will occur after bet 1 if the first time that his cumulative losses after the initial 
gamble is k + j occurs after an additional m — 1 bets. That is, 


P(T_, = n|X1 = 9) = PUT_e@sj =n— 1) 


Consequently, 


PL Bam ao Pipa ape, 


j=-1 
(oe) 
= PCy = DP; 
j=l 
= PS = + IP, 
0° 


where the last equality follows by Induction Hypothesis (3.35). Using that 


P(Sn = —R|Xq = j) = P{Sp-1 = —(R + ph 
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the preceding yields 


SS R+i 
P(T_,»~=n= > ——" PS, = —k|X, = DP; 
j= 
ae Bee ee cee 
n—-1 


jan 


=> E+ I p(x, = jIS, =-WPG, = -&) 
a n—1 
ee 
= P(S, = —k) 1, Y> P(X = flSn = —R) 


n—-1. 
j=-1 


1, 
+ nee jP(X1 =is=-| 


j=-1 
k 
= P(S, = —k) + E[X4|S, = —k] (3.36) 
n—-1 n-1 
However, 
—k= E[Sn|Sn = —k] 
= E[X,; +...+ X,|S, = —k] 
= D7 ELX|Sn = —R] 
i= 

= nE[X1|Sn = —k] 

where the final equation follows because X1,..., X, are independent and identi- 


cally distributed and thus the distribution of X; given that X; +... + X, =—k 
is the same for all 7. Hence, 


E[X1|Sp = —k] = —— 


Substituting the preceding into (3.36) gives 


1 ok 


n—1n 


k k 
P(T_, =n") =P(S,= o( ) = 276, k) 


and completes the proof. a 
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Suppose that after 7 bets the gambler is down k. Then the conditional proba- 
bility that this is the first time he has ever been down k is 


P(T_p =f”, Sn = —k) 
PSSeb 

P(T_,» =n) 

~ P(S, = —2) 


P4758, 2==)= 


= k (by the hitting time theorem) 
n 


Let us suppose for the remainder of this section that —y = E[X] < 0. 
Combining the hitting time theorem with our previously derived result about 
E[T_,] gives the following: 


k 
— = E[T_,] 
a Ss nP(T_, =n) 
n=1 
=) RPS, = —&) 
n=1 


where the final equality used the hitting time theorem. Hence, 
(oe) 
1 
Y > P(Sp = —k) = = 
v 
n=1 


Let I,, be an indicator variable for the event that S,, = —k. That is, let 


1 aft ifSn=-k 
"10, ifS, 4k 


and note that 


Ce 
total time gambler’s fortune is —k = » In 


n=1 


Taking expectations gives 


= 1 
E[total time gambler’s fortune is —k] = » P(S, =—k) = 5 (3.37) 
n=1 
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Now, let a be the probability that the random walk is always negative after the 
initial movement. That is, 


a = P(S, < 0 for all 2 > 1) 


To determine a note that each time the gambler’s fortune is —k the probability 
that it will never again hit —k (because all cumulative winnings starting at that 
time are negative) is a. Hence, the number of times that the gambler’s fortune is 
—k is a geometric random variable with parameter a, and thus has mean 1/qa. 
Consequently, from (3.37) 


a=vU 


Let us now define L_, to equal the last time that the random walk hits —k. 
Because L_, will equal 7 if S,, = —k and the sequence of cumulative winnings 
from time ” onwards is always negative, we see that 


P(L_p =n) = P(S, = —k)a = P(S, = —h)v 


Hence, 
[oe 
E[L_,] = ye nP(L_, = 7) 
n=0 


=v) nP(Sn = —k) 
n=0 


[o,@) 
=v yy ne PT p=) by the hitting time theorem 


n=0 
Vv as 2 
= rae P(T_, =n) 
n=0 
= ZEIT I 
= AE IT-al + Var(T_,)} 
yp 


3.7 An Identity for Compound Random Variables 


Let X1, X2,... be a sequence of independent and identically distributed random 
variables, and let S,, = )~'_, X; be the sum of the first 7 of them, 1 > 0, where 
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So = 0. Recall that if N is a nonnegative integer valued random variable that is 
independent of the sequence Xj, X2,... then 


N 
Sn = bar.¢ 
i=1 


is said to be a compound random variable, with the distribution of N called 
the compounding distribution. In this subsection we will first derive an identity 
involving such random variables. We will then specialize to where the X; are 
positive integer valued random variables, prove a corollary of the identity, and 
then use this corollary to develop a recursive formula for the probability mass 
function of Sy, for a variety of common compounding distributions. 

To begin, let M be a random variable that is independent of the sequence 
X 1, X2,..., and which is such that 


PIM je De 

Proposition 3.5 (The Compound Random Variable Identity) For any function h 
E[SnA(SN)] = E[NJE[X1h(Sm)] 

Proof. 


N 
E[Snh(Sn)] = E bp xin 


i=1 


lee) N 
=p ae b X;h(Sn)|N = | P{N =n} 


n=0 i=1 


(by conditioning on N) 


= oy E bp Xjh(Sp)|N = q P{N =n} 


n=0 i=1 
= a E be xin P{N =n} 
n=0 i=1 
(by independence of N and X1,..., Xn) 
= °C E[Xh(S,) P(N = 7} 


n=0 i=1 
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Now, because X,...,X, are independent and identically distributed, and 
h(Sn) = h(X, + --- + Xy) is a symmetric function of X1,..., Xn, it follows 
that the distribution of X;h(S,,) is the same for all i = 1,...,”. Therefore, con- 
tinuing the preceding string of equalities yields 


E[Snh(Sn)] = ) > nE[X1h(S,) P(N = n} 
n=0 


= E[N] )) E[X1h(S,)|P{M =n} (definition of M) 
n=0 


= E[N] } 7) E[X1h(Sn)|M = n]P(M = n} 
n=0 
(independence of M and X1,..., Xn) 


= E[N] ) | ELX1h(Sy)|M = n]P(M = n} 
n=0 
= E[NJE[X1h(Sm)] 
which proves the proposition. a 


Suppose now that the Xj; are positive integer valued random variables, and let 
aj = P{X, =j}, j>0 


The successive values of P{Sj = k} can often be obtained from the following 
corollary to Proposition 3.5. 


Corollary 3.6 
P{Sxn = 0} = P{N = 0} 
k 


1 
P{Sn =k} = joe) 5 jajP{Su-1 =k-j}, k>0 
j=l 


Proof. For k fixed, let 


1, ifx=k 
no =| 9 ifx £k 


and note that Syh(SN) is either equal to k if Sy = k or is equal to 0 otherwise. 
Therefore, 


E[Snh(SN)] = RP{SN = k} 
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and the compound identity yields 


RP{Sn = k} = E[NJE[X1h(Sm)] 


= EIN] )) E[X1h(Sm))|X1 = jlo; 
= EIN] )CiE(A(Sm)|X1 = fay 
= E[N] )°jP{Sm = klX1 = joj es) 


Now, 


M 
P(Sm = 1X1 =f} =P) 2X = 4|X1 -'| 


= P{Sy_-1 =k — j} 


The next to last equality followed because X2,..., Xi and X1,..., Xy_1 have 
the same joint distribution; namely that of M — 1 independent random variables 
that all have the distribution of X;, where M — 1 is independent of these random 
variables. Thus the proof follows from Equation (3.38). a 


When the distributions of M — 1 and N are related, the preceding corollary 
can be a useful recursion for computing the probability mass function of Sy, as 
is illustrated in the following subsections. 


3.7.1 Poisson Compounding Distribution 
If N is the Poisson distribution with mean A, then 
P{M—1=n}=P{M=n+1} 


_ (n+ IP{IN=n+ I} 
7 E[N] 
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1 n+1 
=—_ pe:—_——\_—— 
ie Ge wy 
Ke 
= eh 
n\ 


Consequently, M — 1 is also Poisson with mean A. Thus, with 
Py SPS SH 


the recursion given by Corollary 3.6 can be written 


“EI aj Pp_j, > 0 


Remark When the X; are identically 1, the preceding recursion reduces to the 
well-known identity for a Poisson random variable having mean A: 


P{N = 0} =e* 
Xr 
P{N=n}=-P{N=n-1}, n2>1 
n 
Example 3.33 Let S be a compound Poisson random variable with 4 = 4 and 
Pian aij4, 543,34 
Let us use the recursion given by Corollary 3.6 to determine P{S = 5}. It gives 
Po = ese 
Py =hay,Po9 = et 


rv Bs. 
Py, = zoP + 2a2Po0) = 7° 4 


A 13 
P3 = <(aP2 + 2a2P) + 303Po) = =e 


3 
Xr 73 
= —(a1P3 + 2a2P2 + 3a3P, + 4a4Po0) = e-" 
4 DA” 
Xr 501 
Ps = =(a1P4 + 2a2P3 + 3a3P2 + 4a4P1 + SasPo0) = pt |_| 


5 120 
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3.7.2 Binomial Compounding Distribution 
Suppose that N is a binomial random variable with parameters r and p. Then, 


(n+ 1)P{N=n+ 1} 
EN] 


= n+1 r n+1 — pyr—n-l 
rp (, + 1)p tee) 


P(M—1=n}= 


1 | 
= n+ r ae Ge = pyr 


p (r-1—-min+ I!” 
Ay 
= Opa — py 


(r—1—n)!n! 


Thus, M — 1 is a binomial random variable with parameters r — 1, p. 
Fixing p, let N(r) be a binomial random variable with parameters r and p, 
and let 
P,(k) = P{Snqy = k} 
Then, Corollary 3.6 yields 
P,(0)=(1—p)’ 


P,(k) = ae r-1k—j), k>0 


j=1 
For instance, letting k equal 1, then 2, and then 3 gives 
P,(1) = rpoy( — py’! 


PQ) =F farP,-1(1) + 2a2P,-1(0)) 
= eu — 1)po2(1 — py’? + 209 (1 — p)~1] 


P,(3) = P fP, 1(2) + 2a2P,_1(1) + 3a3P,—1(0)] 


= 100 OP Lr — Dopod td — py"? + 2an(t - py] 


2 
ef Pr —1pay(1 — py"? + a3rp( — py"! 
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3.7.3, A Compounding Distribution Related to the Negative Binomial 


Suppose, for a fixed value of p, 0 < p < 1, the compounding random variable N 
has a probability mass function 


n+r—1 


PIN = n) = ( 4 


pra =p" n=0,1,... 
Such a random variable can be thought of as being the number of failures that 
occur before a total of r successes have been amassed when each trial is inde- 
pendently a success with probability p. (There will be 7 such failures if the rth 
success occurs on trial 2 + r. Consequently, N + ris a negative binomial random 
variable with parameters ¢ and p.) Using that the mean of the negative binomial 
random variable N + ris E[N + r] =1r/p, we see that E[N] = rae 

Regard p as fixed, and call N an NB(r) random variable. The random variable 
M — 1 has probability mass function 


Pepe eNews 
E[N] 


_(at+ilp(nt+r\,, kal 
aren tied i 


(n+l, , 
= ag 


= (" : ‘era —p)" 


In other words, M — 1 is an NB(r + 1) random variable. 
Letting, for an NB(r) random variable N, 


P,(k) = P{Sn = k} 
Corollary 3.6 yields 


P,(0) = p" 
rp) ~ 
Pr(k) = Fe DigPuik—-jf, k>0 
j=l 
Thus, 
is 
Pee a1 Pr41(0) 


= rp'(1 = p)a, 
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P,(2) = ep lePrvs + 2arP,41(0)] 
= Peter + peta - py + arp") 
P,(3) = Op lesPrst®) + 202P,41(1) + 3a3P,+41(0)] 
and so on. 
Exercises 


1. If X and Y are both discrete, show that >, pxjy(xly) = 1 for all y such that 
py(y) > 0. 
*2. Let X1 and X2 be independent geometric random variables having the same param- 
eter p. Guess the value of 
P{X, =1|X, + X2 =n} 


Hint: Suppose a coin having probability p of coming up heads is continually 
flipped. If the second head occurs on flip number x, what is the conditional proba- 
bility that the first head was on flip number i, i= 1,...,2— 1? 


Verify your guess analytically. 
3. The joint probability mass function of X and Y, p(x, y), is given by 


o1,)=4, p2D=4, 93, =), 
p1,2)= 5, p(2,2)=0, p(3,2)= %, 
p(1,3)=0, p2,3)=1, p3,3)=4 
Compute E[X|Y = i] for i = 1,2, 3. 
In Exercise 3, are the random variables X and Y independent? 


5. An urn contains three white, six red, and five black balls. Six of these balls are 
randomly selected from the urn. Let X and Y denote respectively the number of 
white and black balls selected. Compute the conditional probability mass function 
of X given that Y = 3. Also compute E[X|Y = 1]. 


*6. Repeat Exercise 5 but under the assumption that when a ball is selected its color is 
noted, and it is then replaced in the urn before the next selection is made. 


7. Suppose p(x, y, 2), the joint probability mass function of the random variables X, 
Y, and Z, is given by 


pay, Dag, pst A= 45 
pad y= sy POA DS a5 
p(1,2,1)= 4, p(2,2,1) =0, 
p52,2)=0,° pQ,2,27= 4 
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What is E[X|Y = 2]? What is E[X|Y = 2, Z = 1]? 
8. An unbiased die is successively rolled. Let X and Y denote, respectively, 
the number of rolls necessary to obtain a six and a five. Find (a) E[X], 
(b) ELX|Y = 1], (c) E[X|Y = 5]. 
9. Show in the discrete case that if X and Y are independent, then 
E[X|Y = y] = E[X] for all y 
10. Suppose X and Y are independent continuous random variables. Show that 
E[X|Y = y] = E[X] for all y 
11. The joint density of X and Y is 
ae) 
(yee O<y<m~, -y<xK<y 
Show that E[X|Y = y] = 0. 
12. The joint density of X and Y is given by 
e7*/Ve-y 
[ey = 0<x<a0, 0<y<aw 
Show E[X|Y = y] = y. 
*13. Let X be exponential with mean 1/4; that is, 
fx(x) =re?*, O0<x<c 
Find E[X|X > 1]. 
14. Let X be uniform over (0, 1). Find E[X|X < 5]. 
15. The joint density of X and Y is given by 
oe 
era O<x<y, 0<y<w 
Compute E[X2|Y = yl]. 
16. The random variables X and Y are said to have a bivariate normal distribution if 


their joint density function is given by 


2(1 — p*) 


2; 
x | (er 2 2p(x — Ux)(y — My) fey 
Ox OxOy Oy 
for —00 < x < 00, —00 < y < 00, where ox, oy, Ux, My, and p are constants such 
that —1 <p <1, ox > 0, oy > 0, —00 < fx < 00, —00 < fy < 00. 


f(x,y) = z ex : 
oy = 108 Pp 
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Lys 


18. 


*19, 


20. 


21. 


(a) Show that X is normally distributed with mean jx and variance o2, and Y is 
normally distributed with mean jy and variance o?. 

(b) Show that the conditional density of X given that Y = y is normal with mean 
Ux + (pox /oy)(y — by) and variance o2(1 —p’). 
The quantity p is called the correlation between X and Y. It can be shown that 


Ves EU(X — Ux)(Y — by)] 
OxOy 
_ Cov(X, Y) 
~ OxOy 


Let Y be a gamma random variable with parameters (s, a). That is, its density is 
fy) = Ce yo", y>0 


where C is a constant that does not depend on y. Suppose also that the conditional 
distribution of X given that Y = y is Poisson with mean y. That is, 


P{X =i|Y=y}=ey//il, i>0 


Show that the conditional distribution of Y given that X = i is the gamma distri- 
bution with parameters (s + i,a@ + 1). 


Let X1,..., Xn be independent random variables having a common distribution 
function that is specified up to an unknown parameter 0. Let T = T(X) bea function 
of the data X = (X1,...,X). If the conditional distribution of X1,..., Xn given 
TCX) does not depend on 6 then T(X) is said to be a sufficient statistic for 6. In the 
following cases, show that T(X) = )~7_,X; is a sufficient statistic for 0. 

(a) The X; are normal with mean 6 and variance 1. 

(b) The density of X; is f(x) = 0e~%, x > 0. 

(c) The mass function of X; is p(x) = 6*(1 — 6)!-*, x =0,1,0 <9<1. 

(d) The X; are Poisson random variables with mean 0. 


Prove that if X and Y are jointly continuous, then 


E[X] = E[X|Y = ylfy(y) dy 


An individual whose level of exposure to a certain pathogen is x will contract the 
disease caused by this pathogen with probability P(x). If the exposure level of a 
randomly chosen member of the population has probability density function f, 
determine the conditional probability density of the exposure level of that member 
given that he or she 

(a) has the disease. 

(b) does not have the disease. 

(c) Show that when P(x) increases in x, then the ratio of the density of part (a) to 

that of part (b) also increases in x. 


Consider Example 3.13, which refers to a miner trapped in a mine. Let N denote 
the total number of doors selected before the miner reaches safety. Also, let Tj 
denote the travel time corresponding to the ith choice, i > 1. Again let X denote 
the time when the miner reaches safety. 
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22. 


*23. 


24. 


25: 


26. 


(a) Give an identity that relates X to N and the Tj. 

(b) What is E[N]? 

(c) What is E[Ty]? 

(d) What is E[)°., T;|N =n]? 

(e) Using the preceding, what is E[X]? 

Suppose that independent trials, each of which is equally likely to have any of m 
possible outcomes, are performed until the same outcome occurs k consecutive 
times. If N denotes the number of trials, show that 


mk —1 
BIN m—1 


Some people believe that the successive digits in the expansion of z = 3.14159... 
are “uniformly” distributed. That is, they believe that these digits have all the 
appearance of being independent choices from a distribution that is equally likely 
to be any of the digits from 0 through 9. Possible evidence against this hypothesis 
is the fact that starting with the 24,658,601st digit there is a run of nine successive 
7s. Is this information consistent with the hypothesis of a uniform distribution? 

To answer this, we note from the preceding that if the uniform hypothesis were 
correct, then the expected number of digits until a run of nine of the same value 
occurs is 


(10° — 1)/9 = 111,111,111 


Thus, the actual value of approximately 25 million is roughly 22 percent of the 
theoretical mean. However, it can be shown that under the uniformity assumption 
the standard deviation of N will be approximately equal to the mean. As a result, the 
observed value is approximately 0.78 standard deviations less than its theoretical 
mean and is thus quite consistent with the uniformity assumption. 


A coin having probability p of coming up heads is successively flipped until two of 
the most recent three flips are heads. Let N denote the number of flips. (Note that 
if the first two flips are heads, then N = 2.) Find E[N]. 


A coin, having probability p of landing heads, is continually flipped until at least 

one head and one tail have been flipped. 

(a) Find the expected number of flips needed. 

) Find the expected number of flips that land on heads. 

) Find the expected number of flips that land on tails. 

) Repeat part (a) in the case where flipping is continued until a total of at least 
two heads and one tail have been flipped. 


Independent trials, resulting in one of the outcomes 1, 2, 3 with respective proba- 

bilities p1, 02,3, yy p; = 1, are performed. 

(a) Let N denote the number of trials needed until the initial outcome has occurred 
exactly 3 times. For instance, if the trial results are 3, 2, 1,2, 3 ,2,3 then N = 7. 
Find E[N]. 

(b) Find the expected number of trials needed until both outcome 1 and outcome 2 
have occurred. 


You have two opponents with whom you alternate play. Whenever you play A, 
you win with probability 4; whenever you play B, you win with probability pz, 
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27. 


28. 


29. 


30. 


31. 


32. 


where pg > pa. If your objective is to minimize the expected number of games you 
need to play to win two in a row, should you start with A or with B? 


Hint: Let E[N;] denote the mean number of games needed if you initially play i. 
Derive an expression for E[Na] that involves E[Ng]; write down the equivalent 
expression for E[Ng] and then subtract. 


A coin that comes up heads with probability p is continually flipped until the pattern 
T, T, H appears. (That is, you stop flipping when the most recent flip lands heads, 
and the two immediately preceding it lands tails.) Let X denote the number of flips 
made, and find E[X]. 


Polya’s urn model supposes that an urn initially contains r red and b blue balls. At 
each stage a ball is randomly selected from the urn and is then returned along with 
m other balls of the same color. Let X, be the number of red balls drawn in the 
first k selections. 
(a) Find E[X,]. 


(b) Find ELX)]. 
(c) Find ELX3]. 
(d) Conjecture the value of ELX,], and then verify your conjecture by a condition- 


ing argument. 
(e) Give an intuitive proof for your conjecture. 


Hint: Number the initial r red and 0 blue balls, so the urn contains one type i red 
ball, for each i = 1,...,7; as well as one type j blue ball, for each j = 1,...,b. 
Now suppose that whenever a red ball is chosen it is returned along with m others 
of the same type, and similarly whenever a blue ball is chosen it is returned along 
with m others of the same type. Now, use a symmetry argument to determine the 
probability that any given selection is red. 


Two players take turns shooting at a target, with each shot by player i hitting the 

target with probability p;, i = 1,2. Shooting ends when two consecutive shots hit 

the target. Let jz; denote the mean number of shots taken when player i shoots first, 

$= 15-2. 

(a) Find wy and pp. 

(b) Let A; denote the mean number of times that the target is hit when player i 
shoots first, i = 1,2. Find hy and ho. 


Let Xj, i > 0 be independent and identically distributed random variables with 
probability mass function 


PP=P ah. f= ams YoPN=1 
j=l 


Find E[N], where N = min{n > 0: X, = Xo}. 

Each element in a sequence of binary data is either 1 with probability p or 0 with 
probability 1 — p. A maximal subsequence of consecutive values having identical 
outcomes is called a run. For instance, if the outcome sequence is 1, 1,0, 1,1, 1,0, 
the first run is of length 2, the second is of length 1, and the third is of length 3. 

(a) Find the expected length of the first run. 

(b) Find the expected length of the second run. 


Independent trials, each resulting in success with probability p, are performed. 
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(a) Find the expected number of trials needed for there to have been both at least 
n successes or at least m failures. 


Hint: Is it useful to know the result of the first 2 + m trials? 


(b) Find the expected number of trials needed for there to have been either at least 
n successes or at least m failures. 


Hint: Make use of the result from part (a). 


If R; denotes the random amount that is earned in period i, then °°, BR, 
where 0 < £ < 1 is a specified constant, is called the total discounted reward with 
discount factor 6. Let T be a geometric random variable with parameter 1 — 6 
that is independent of the R;. Show that the expected total discounted reward 
is equal to the expected total (undiscounted) reward earned by time T. That is, 
show that 


oo T 
E b | SER 
i=1 


i=1 


A set of 7 dice is thrown. All those that land on six are put aside, and the others are 

again thrown. This is repeated until all the dice have landed on six. Let N denote 

the number of throws needed. (For instance, suppose that 7 = 3 and that on the 

initial throw exactly two of the dice land on six. Then the other die will be thrown, 

and if it lands on six, then N = 2.) Let my, = E[N]. 

(a) Derive a recursive formula for my, and use it to calculate m;, i = 2, 3,4 and to 
show that m5 ~ 13.024. 

(b) Let X; denote the number of dice rolled on the ith throw. Find EID, Xj). 

Consider 7 multinomial trials, where each trial independently results in outcome i 

with probability pj;, pany pi = 1. With X; equal to the number of trials that result 

in outcome 1, find E[X1|X> > 0]. 

Let po = P{X = 0} and suppose that 0 < po < 1. Let w = E[X] and o* = Var(X). 

(a) Find E[X|X # 0]. 

(b) Find Var(X|X # 0). 

A manuscript is sent to a typing firm consisting of typists A, B, and C. If it is typed 

by A, then the number of errors made is a Poisson random variable with mean 2.6; 

if typed by B, then the number of errors is a Poisson random variable with mean 3; 

and if typed by C, then it is a Poisson random variable with mean 3.4. Let X denote 

the number of errors in the typed manuscript. Assume that each typist is equally 

likely to do the work. 

(a) Find E[X]. 

(b) Find Var(X). 

Let U bea uniform (0, 1) random variable. Suppose that 7 trials are to be performed 

and that conditional on U = u these trials will be independent with a common 

success probability u. Compute the mean and variance of the number of successes 

that occur in these trials. 

A deck of n cards, numbered 1 through n, is randomly shuffled so that all 2! possible 

permutations are equally likely. The cards are then turned over one at a time until 

card number 1 appears. These upturned cards constitute the first cycle. We now 
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determine (by looking at the upturned cards) the lowest numbered card that has 
not yet appeared, and we continue to turn the cards face up until that card appears. 
This new set of cards represents the second cycle. We again determine the lowest 
numbered of the remaining cards and turn the cards until it appears, and so on until 
all cards have been turned over. Let m,, denote the mean number of cycles. 

(a) Derive a recursive formula for my in terms of mp, k= 1,...,2—1. 

b) Starting with 79 = 0, use the recursion to find m1, m2,m3, and m4. 

c) Conjecture a general formula for my. 

d) Prove your formula by induction on 1. That is, show it is valid for 7 = 1, then 
assume it is true for any of the values 1,...,7— 1 and show that this implies 
it is true for 7. 

(e) Let X; equal 1 if one of the cycles ends with card i, and let it equal 0 otherwise, 
i=1,...,. Express the number of cycles in terms of these Xj. 

(f) Use the representation in part (e) to determine my. 

(g) Are the random variables X1,...,X,, independent? Explain. 

(h) Find the variance of the number of cycles. 


A prisoner is trapped in a cell containing three doors. The first door leads to a 
tunnel that returns him to his cell after two days of travel. The second leads to a 
tunnel that returns him to his cell after three days of travel. The third door leads 
immediately to freedom. 

(a) Assuming that the prisoner will always select doors 1, 2, and 3 with prob- 
abilities 0.5, 0.3, 0.2, what is the expected number of days until he reaches 
freedom? 

(b) Assuming that the prisoner is always equally likely to choose among those 
doors that he has not used, what is the expected number of days until he 
reaches freedom? (In this version, for instance, if the prisoner initially tries 
door 1, then when he returns to the cell, he will now select only from doors 2 
and 3.) 

(c) For parts (a) and (b) find the variance of the number of days until the prisoner 
reaches freedom. 


A rat is trapped in a maze. Initially it has to choose one of two directions. If it 
goes to the right, then it will wander around in the maze for three minutes and will 
then return to its initial position. If it goes to the left, then with probability ; it 
will depart the maze after two minutes of traveling, and with probability 4 it will 
return to its initial position after five minutes of traveling. Assuming that the rat is 
at all times equally likely to go to the left or the right, what is the expected number 
of minutes that it will be trapped in the maze? 


If X;,i = 1,..., are independent normal random variables, with X; having mean 

yj and variance 1, then the random variable )~_ X? is said to be a noncentral 

chi-squared random variable. 

(a) if X is a normal random variable having mean and variance 1 show, for 
|t| < 1/2, that the moment generating function of X? is 


t 2 
(1 — 22)! e1-3 


(b) Derive the moment generating function of the noncentral chi-squared random 
variable )~”_, X?, and show that its distribution depends on the sequence of 
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means /11,..-, /4n only through the sum of their squares. As a result, we say 
that )7", x? is a noncentral chi-squared random variable with parameters n 
and @ = 0, p?. 

(c) If all w; = 0, then °7, xe is called a chi-squared random variable with 
n degrees of freedom. Determine, by differentiating its moment generating 
function, its expected value and variance. 

(d) Let K be a Poisson random variable with mean 6/2, and suppose that condi- 
tional on K = k, the random variable W has a chi-squared distribution with 
n + 2k degrees of freedom. Show, by computing its moment generating func- 
tion, that W is a noncentral chi-squared random variable with parameters 1 
and 6. 

(e) Find the expected value and variance of a noncentral chi-squared random 
variable with parameters 7 and 0. 


The density function of a chi-squared random variable having 7 degrees of freedom 
can be shown to be 
1 ,—x/2 aa 
xe (x/2)2 
fo) =" 
(n/2) 
where I'(f) is the gamma function defined by 
o.e) 
r@) = / e *x''dx, t>0 
0 


Integration by parts can be employed to show that P(t) = (¢—1)I'(¢—-1), whent > 1. 
If Z and x2 are independent random variables with Z having a standard normal 
distribution and x2 having a chi-square distribution with n degrees of freedom, then 
the random variable T defined by 


Z 


Vain 


is said to have a t-distribution with n degrees of freedom. Compute its mean and 
variance when 7 > 2. 


T= 


The number of customers entering a store on a given day is Poisson distributed with 
mean A = 10. The amount of money spent by a customer is uniformly distributed 
over (0, 100). Find the mean and variance of the amount of money that the store 
takes in on a given day. 


An individual traveling on the real line is trying to reach the origin. However, 
the larger the desired step, the greater is the variance in the result of that step. 
Specifically, whenever the person is at location x, he next moves to a location 
having mean 0 and variance Bx2. Let X, denote the position of the individual after 
having taken 7 steps. Supposing that Xo = xo, find 

(a) E[Xn]; 

(b) Var(X;,). 

(a) Show that 


Cov(X, Y) = Cov(X, ELY | X]) 
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(b) Suppose, that, for constants a and b, 
E[Y|X])=a+ bX 

Show that 
b = Cov(X, Y)/Var(X) 

If E[Y | X] = 1, show that 
Var(X Y) > Var(X) 


Suppose that we want to predict the value of a random variable X by using one 
of the predictors Y1,..., Yn, each of which satisfies E[Y;|X] = X. Show that the 
predictor Y; that minimizes E[(Y; — X)] is the one whose variance is smallest. 


Hint: Compute Var(Y;) by using the conditional variance formula. 


A and B play a series of games with A winning each game with probability p. The 
overall winner is the first player to have won two more games than the other. 

(a) Find the probability that A is the overall winner. 

(b) Find the expected number of games played. 


There are three coins in a barrel. These coins, when flipped, will come up heads 
with respective probabilities 0.3, 0.5, 0.7. A coin is randomly selected from among 
these three and is then flipped ten times. Let N be the number of heads obtained 
on the ten flips. 

(a) Find P{N = 0}. 


(b) Find P(N = n},n=0,1,...,10. 
(c) Does N have a binomial distribution? 
(d) Ifyou win $1 each time a head appears and you lose $1 each time a tail appears, 


is this a fair game? Explain. 
If X is geometric with parameter p, find the probability that X is even. 


Suppose that X and Y are independent random variables with probability density 
functions fx and fy. Determine a one-dimensional integral expression for P{X + 
Y < x}. 

Suppose X is a Poisson random variable with mean A. The parameter A is itself a 
random variable whose distribution is exponential with mean 1. Show that P{X = 
nya (Gh 

A coin is randomly selected from a group of ten coins, the mth coin having a prob- 
ability 2/10 of coming up heads. The coin is then repeatedly flipped until a head 
appears. Let N denote the number of flips necessary. What is the probability dis- 
tribution of N? Is N a geometric random variable? When would N be a geometric 
random variable; that is, what would have to be done differently? 


You are invited to a party. Suppose the times at which invitees are independent 
uniform (0,1) random variables. Suppose that, aside from yourself, the number of 
other people who are invited is a Poisson random variable with mean 10. 

(a) Find the expected number of people who arrive before you. 

(b) Find the probability that you are the mth person to arrive. 
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Data indicate that the number of traffic accidents in Berkeley on a rainy day is a 
Poisson random variable with mean 9, whereas on a dry day it is a Poisson random 
variable with mean 3. Let X denote the number of traffic accidents tomorrow. If it 
will rain tomorrow with probability 0.6, find 

(a) E[X]; 

(b) P{X = 0}; 

(c) Var(X). 


The number of storms in the upcoming rainy season is Poisson distributed but with 
a parameter value that is uniformly distributed over (0,5). That is, A is uniformly 
distributed over (0, 5), and given that A = A, the number of storms is Poisson with 
mean A. Find the probability there are at least three storms this season. 


A collection of 1 coins is flipped. The outcomes are independent, and the ith coin 
comes up heads with probability a;,i = 1,...,”. Suppose that for some value of 
j1<jc<cnaj= Si Find the probability that the total number of heads to appear 
on the 7 coins is an even number. 


Suppose each new coupon collected is, independent of the past, a type 7 coupon 

with probability p;. A total of 2 coupons is to be collected. Let A; be the event that 

there is at least one type i in this set. For i 4 j, compute P(A;A;) by 

(a) conditioning on Nj;, the number of type i coupons in the set of 2 coupons; 

(b) conditioning on F;, the first time a type i coupon is collected; 

(c) using the identity P(A; U Aj) = P(A;) + P(Aj) - P(A;A)j). 

Two players alternate flipping a coin that comes up heads with probability p. The 

first one to obtain a head is declared the winner. We are interested in the probability 

that the first player to flip is the winner. Before determining this probability, which 

we will call f(p), answer the following questions. 

(a) Do you think that f(p) is a monotone function of p? If so, is it increasing or 
decreasing? 

(b) What do you think is the value of limp+1 f(p)? 

(c) What do you think is the value of limp+o f(p)? 

(d) Find f(p). 

Suppose in Exercise 29 that the shooting ends when the target has been hit twice. 

Let m; denote the mean number of shots needed for the first hit when player i shoots 

first, i = 1,2. Also, let P;, i = 1,2, denote the probability that the first hit is by 

player 1, when player i shoots first. 

(a) Find m1 and mp. 

(b) Find P; and P3. 

For the remainder of the problem, assume that player 1 shoots first. 

(c) Find the probability that the final hit was by 1. 

(d) Find the probability that both hits were by 1. 

(e) Find the probability that both hits were by 2. 

(f) Find the mean number of shots taken. 


A, B, and C are evenly matched tennis players. Initially A and B play a set, and 
the winner then plays C. This continues, with the winner always playing the 
waiting player, until one of the players has won two sets in a row. That player 
is then declared the overall winner. Find the probability that A is the overall 
winner. 
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Suppose there are 1 types of coupons, and that the type of each new coupon obtained 

is independent of past selections and is equally likely to be any of the types. 

Suppose one continues collecting until a complete set of at least one of each type is 

obtained. 

(a) Find the probability that there is exactly one type i coupon in the final 
collection. 


Hint: Condition on T, the number of types that are collected before the first type 
7 appears. 


(b) Find the expected number of types that appear exactly once in the final 
collection. 


A and B roll a pair of dice in turn, with A rolling first. A’s objective is to obtain a 
sum of 6, and B’s is to obtain a sum of 7. The game ends when either player reaches 
his or her objective, and that player is declared the winner. 

(a) Find the probability that A is the winner. 

(b) Find the expected number of rolls of the dice. 

(c) Find the variance of the number of rolls of the dice. 


The number of red balls in an urn that contains z balls is a random variable that is 
equally likely to be any of the values 0,1,...,7. That is, 


1 


P{i red,n —i -red} = ——., 
{7 red, — i non-red} aE | 


i=0,...,” 

The x balls are then randomly removed one at a time. Let Y, denote the number of 
red balls in the first k selections, k = 1,...,7. 

(a) Find P{Y, =j},j=0,...,7. 


(b) Find P{Y,_1 =j},7=0,...,7. 
(c) What do you think is the value of P{Y, = j},j = 0,...,7? 
(d) Verify your answer to part (c) by a backwards induction argument. That is, 


check that your answer is correct when k = n, and then show that whenever 
it is true for k it is also true fork —1,k=1,...,7. 


The opponents of soccer team A are of two types: either they are a class 1 or a 
class 2 team. The number of goals team A scores against a class i opponent is a 
Poisson random variable with mean A;, where A; = 2, A. = 3. This weekend the 
team has two games against teams they are not very familiar with. Assuming that 
the first team they play is a class 1 team with probability 0.6 and the second is, 
independently of the class of the first team, a class 1 team with probability 0.3, 
determine 

(a) the expected number of goals team A will score this weekend. 

(b) the probability that team A will score a total of five goals. 

A coin having probability p of coming up heads is continually flipped. Let P;(7) 
denote the probability that a run of j successive heads occurs within the first 
flips. 

(a) Argue that 


Pj(n) = Pj(n—1) + pi —p)[1 — Pn —j - 1] 


(b) By conditioning on the first non-head to appear, derive another equation 
relating P;(7) to the quantities P;(m — k),k = 1,...,/. 
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In a knockout tennis tournament of 2” contestants, the players are paired and 
play a match. The losers depart, the remaining 2”—! players are paired, and they 
play a match. This continues for 7 rounds, after which a single player remains 
unbeaten and is declared the winner. Suppose that the contestants are numbered 1 
through 2”, and that whenever two players contest a match, the lower numbered 
one wins with probability p. Also suppose that the pairings of the remaining players 
are always done at random so that all possible pairings for that round are equally 
likely. 

(a) What is the probability that player 1 wins the tournament? 

(b) What is the probability that player 2 wins the tournament? 


Hint: Imagine that the random pairings are done in advance of the tournament. 

That is, the first-round pairings are randomly determined; the 2”~! first-round pairs 

are then themselves randomly paired, with the winners of each pair to play in round 

2; these 2”~* groupings (of four players each) are then randomly paired, with the 

winners of each grouping to play in round 3, and so on. Say that players i and j are 

scheduled to meet in round k if, provided they both win their first k — 1 matches, 

they will meet in round k. Now condition on the round in which players 1 and 2 

are scheduled to meet. 

In the match problem, say that (i,j), i <j, is a pair if i chooses j’s hat and j chooses 

i’s hat. 

(a) Find the expected number of pairs. 

(b) Let OQ, denote the probability that there are no pairs, and derive a recursive 
formula for Q, in terms of Qj,j <n. 

Hint: Use the cycle concept. 

(c) Use the recursion of part (b) to find Og. 

Let N denote the number of cycles that result in the match problem. 

(a) Let M, = E[N], and derive an equation for M,, in terms of My,...,My_1. 

(b) Let C; denote the size of the cycle that contains person j. Argue that 


NaS CAG, 
j=l 


and use the preceding to determine E[N]. 
(c) Find the probability that persons 1,2,..., are all in the same cycle. 
(d) Find the probability that 1,2,...,k is a cycle. 
Use Equation (3.14) to obtain Equation (3.10). 
Hint: First multiply both sides of Equation (3.14) by 1, then write a new equation 
by replacing 7 with m — 1, and then subtract the former from the latter. 
In Example 3.28 show that the conditional distribution of N given that U; = y is 
the same as the conditional distribution of M given that U; = 1 — y. Also, show 
that 


E[N|U, = y] = E[M|U; =1-y] =1+ e” 


Suppose that we continually roll a die until the sum of all throws exceeds 100. What 
is the most likely value of this total when you stop? 
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There are five components. The components act independently, with component i 
working with probability p;, i = 1,2, 3,4,5. These components form a system as 
shown in Figure 3.7. 

The system is said to work if a signal originating at the left end of the diagram can 
reach the right end, where it can pass through a component only if that component 
is working. (For instance, if components 1 and 4 both work, then the system also 
works.) What is the probability that the system works? 


This problem will present another proof of the ballot problem of Example 3.27. 
(a) Argue that 


Piym = 1— P{A and B are tied at some point} 
(b) Explain why 


P{A receives first vote and they are eventually tied} 


= P{B receives first vote and they are eventually tied} 


Hint: Any outcome in which they are eventually tied with A receiving the first vote 
corresponds to an outcome in which they are eventually tied with B receiving the 
first vote. Explain this correspondence. 


(c) Argue that P{eventually tied} = 2m/(n + m), and conclude that Pryn = (1 — 
m)/(n +m). 

Consider a gambler who on each bet either wins 1 with probability 18/38 or loses 

1 with probability 20/38. (These are the probabilities if the bet is that a roulette 

wheel will land on a specified color.) The gambler will quit either when he or she 

is winning a total of 5 or after 100 plays. What is the probability he or she plays 

exactly 15 times? 


Show that 

(a) E[XY|Y = y] = yE[X|Y = y] 

(b) Elg(X, YY = yl = Elg(X,y|Y = y] 

(c) ELXY] = ELYE[X|Y]] 

In the ballot problem (Example 3.27), compute P{A is never behind}. 


An urn contains 7 white and m black balls that are removed one at a time. If 
n > m, show that the probability that there are always more white than black balls 
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in the urn (until, of course, the urn is empty) equals (7 — m)/(n + m). Explain why 
this probability is equal to the probability that the set of withdrawn balls always 
contains more white than black balls. (This latter probability is (” — m)/(n + m) 
by the ballot problem.) 

80. Acoin that comes up heads with probability p is flipped 7 consecutive times. What 
is the probability that starting with the first flip there are always more heads than 
tails that have appeared? 


81. Let X;,i>1, be independent uniform (0,1) random variables, and define N by 
N = min{n: Xn < Xy-1} 


where Xo = x. Let f(x) = E[N]. 
(a) Derive an integral equation for f(x) by conditioning on Xj. 


(b) Differentiate both sides of the equation derived in part (a). 
(c) Solve the resulting equation obtained in part (b). 
(d) For a second approach to determining f(x) argue that 
d— x)k-1 
PIN Shes = 
ee ar ST 


(e) Use part (d) to obtain f(x). 


82. Let X1,X2,... be independent continuous random variables with a common dis- 
tribution function F and density f = F’, and for k > 1 let 


Ng = min{n > k: X, = kth largest of X1,..., Xn} 


(a) Show that P{N, =n} = tpn >k. 
(b) Argue that 


7 OS ps iy ey) 
fey, O=F@ES) Y- (’ ) oo 
i=0 
(c) Prove the following identity: 
way )a-ar, O<a<1,k>2 


Hint: Use induction. First prove it when k = 2, and then assume it for k. To prove 
it for k + 1, use the fact that 
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where the preceding used the combinatorial identity 


& (” : ‘) (’ ; ') 

.J= : aon ae 

1 1 i-1 

Now, use the induction hypothesis to evaluate the first term on the right side of the 


preceding equation. 
(d) Conclude that Xj, has distribution F. 


An urn contains 7 balls, with ball i having weight w;,i = 1,...,7. The balls are 
withdrawn from the urn one at a time according to the following scheme: When 
S is the set of balls that remains, ball i,i € S, is the next ball withdrawn with 
probability w;/}’j-5w;. Find the expected number of balls that are withdrawn 
before ball i,i = 1,...,7. 


In the list example of Section 3.6.1 suppose that the initial ordering at time t = 0 is 
determined completely at random; that is, initially all 7! permutations are equally 
likely. Following the front-of-the-line rule, compute the expected position of the 
element requested at time tf. 


Hint: To compute P{e; precedes e; at time t} condition on whether or not either e; 
or ej has ever been requested prior to f. 


In the list problem, when the P; are known, show that the best ordering (best in the 
sense of minimizing the expected position of the element requested) is to place the 
elements in decreasing order of their probabilities. That is, if Pj} > Pz >--- > Py, 
show that 1,2,..., is the best ordering. 


Consider the random graph of Section 3.6.2 when n = 5. Compute the probability 
distribution of the number of components and verify your solution by using it to 
compute E[C] and then comparing your solution with 


5 


5\ (k — 1)! 
Bcl= > (4) 
k 
a k 5 
(a) From the results of Section 3.6.3 we can conclude that there are () 
nonnegative integer valued solutions of the equation x1 + --- + Xm = 1. 
Prove this directly. 
(b) How many positive integer valued solutions of x1 + --- + Xm =m are there? 


Hint: Let yj = x; — 1. 
(c) For the Bose-Einstein distribution, compute the probability that exactly k of 
the X; are equal to 0. 


In Section 3.6.3, we saw that if U is a random variable that is uniform on 
(0, 1) and if, conditional on U = p, X is binomial with parameters n and p, then 


PIX =i} = i=0,1,...,7 


n+1’ 


For another way of showing this result, let U, X1, X2,..., Xn be independent uni- 
form (0, 1) random variables. Define X by 


X = #1: X; < U 
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That is, if the 7 + 1 variables are ordered from smallest to largest, then U would 
be in position X 4+ 1. 

(a) What is P{X = i}? 

(b) Explain how this proves the result of Section 3.6.3. 

Let I1,..., I, be independent random variables, each of which is equally likely to 
be either 0 or 1. A well-known nonparametric statistical test (called the signed rank 
test) is concerned with determining P,,(k) defined by 


Justify the following formula: 
Py(k) = 5Py—1(R) + 5 Pn-1(k — 1) 


The number of accidents in each period is a Poisson random variable with mean 5S. 
With X,, 1 > 1, equal to the number of accidents in period n, find E[N] when 

(a) N=min(m: X,_2 = 2,Xy,_1 = 1, X, = 0); 

(b) N=min(n: X,-3 = 2,Xn—2 = 1,Xn-1 = 0,Xn = 2). 

Find the expected number of flips of a coin, which comes up heads with probabil- 
ity p, that are necessary to obtain the pattern h, t,h,h,t,h,t,h. 


The number of coins that Josh spots when walking to work is a Poisson random 

variable with mean 6. Each coin is equally likely to be a penny, a nickel, a dime, or 

a quarter. Josh ignores the pennies but picks up the other coins. 

(a) Find the expected amount of money that Josh picks up on his way to work. 

(b) Find the variance of the amount of money that Josh picks up on his way to 
work. 

(c) Find the probability that Josh picks up exactly 25 cents on his way to work. 


Consider a sequence of independent trials, each of which is equally likely to result in 

any of the outcomes 0, 1,...,7. Say that a round begins with the first trial, and that 

a new round begins each time outcome 0 occurs. Let N denote the number of trials 

that it takes until all of the outcomes 1,...,71—1 have occurred in the same round. 

Also, let T; denote the number of trials that it takes until distinct outcomes have 

occurred, and let J; denote the jth distinct outcome to occur. (Therefore, outcome 

]; first occurs at trial Tj.) 

(a) Argue that the random vectors (I1,..., Ij) and (T1,..., Ty) are independent. 

(b) Define X by letting X = 7 if outcome 0 is the jth distinct outcome to occur. 
(Thus, Ix = 0.) Derive an equation for E[N] in terms of E[Tj],j = 1,...,m—1 
by conditioning on X. 

(c) Determine E[Tj], j= 1,...,m—1. 

Hint: See Exercise 42 of Chapter 2. 

(d) Find E[N]. 


Let N be a hypergeometric random variable having the distribution of the number 
of white balls in a random sample of size r from a set of w white and b blue balls. 
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95, 


96. 


*97, 


That is, 


where we use the convention that (") = 0 if either ; < 0 or j > m. Now, consider 


a compound random variable Sy = ye X;, where the X; are positive integer 

valued random variables with a; = P{X; = j}. 

(a) With M as defined as in Section 3.7, find the distribution of M — 1. 

(b) Suppressing its dependence on J, let Py,,(k) = P{Sn = k}, and derive a recur- 
sion equation for P,,,(k). 

(c) Use the recursion of (b) to find P,,,(2). 


For the left skip free random walk of Section 3.6.6 let B = P(S, < 0 for all 2) be 
the probability that the walk is never positive. Find 6 when E[X;] < 0. 


Consider a large population of families, and suppose that the number of children in 
the different families are independent Poisson random variables with mean A. Show 
that the number of siblings of a randomly chosen child is also Poisson distributed 
with mean A. 


Use the conditional variance formula to find the variance of a geometric random 
variable. 
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4.1 Introduction 


Consider a process that has a value in each time period. Let X, denote its value 
in time period m, and suppose we want to make a probability model for the 
sequence of successive values Xo, X1, X2.... The simplest model would probably 
be to assume that the X,, are independent random variables, but often such an 
assumption is clearly unjustified. For instance, starting at some time suppose 
that X,, represents the price of one share of some security, such as Google, at 
the end of m additional trading days. Then it certainly seems unreasonable to 
suppose that the price at the end of day 7 + 1 is independent of the prices on days 
n,n — 1,n — 2 and so on down to day 0. However, it might be reasonable to 
suppose that the price at the end of trading day m + 1 depends on the previous 
end-of-day prices only through the price at the end of day n. That is, it might be 
reasonable to assume that the conditional distribution of X;,41 given all the past 
end-of-day prices X,, X, — 1,..., Xo depends on these past prices only through 
the price at the end of day 7. Such an assumption defines a Markov chain, a 
type of stochastic process that will be studied in this chapter, and which we now 
formally define. 

Let {X,,1 = 0,1,2,...,} be a stochastic process that takes on a finite or 
countable number of possible values. Unless otherwise mentioned, this set of 
possible values of the process will be denoted by the set of nonnegative inte- 
gers {0,1,2,...}. If X, = i, then the process is said to be in state i at time n. 


Introduction to Probability Models, ISBN: 9780123756862 
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We suppose that whenever the process is in state i, there is a fixed probability P; 
that it will next be in state j. That is, we suppose that 


P{Xp41 = f|Xn = 1, Xn-1 = tn-1,---,X1 = 41, X0 = 9} = Pi (4.1) 


for all states ig, i1,...,in—1,7,j andallm > 0. Sucha stochastic process is known as 
a Markov chain. Equation (4.1) may be interpreted as stating that, fora Markov 
chain, the conditional distribution of any future state X,,41, given the past states 
Xo, X1,---,;X,_1 and the present state X,,, is independent of the past states and 
depends only on the present state. 

The value P;; represents the probability that the process will, when in state i, 
next make a transition into state j. Since probabilities are nonnegative and since 
the process must make a transition into some state, we have 


Co 
Pi 2 9, 1,7 205 PH, i=0,1,... 
j=0 


Let P denote the matrix of one-step transition probabilities P;;, so that 


Poo Por Po2 

Pio Py Pi 
Pp=| : ; : 

Pio Pi Pi 


Example 4.1 (Forecasting the Weather) Suppose that the chance of rain tomor- 
row depends on previous weather conditions only through whether or not it is 
raining today and not on past weather conditions. Suppose also that if it rains 
today, then it will rain tomorrow with probability a; and if it does not rain today, 
then it will rain tomorrow with probability £. 

If we say that the process is in state 0 when it rains and state 1 when it does 
not rain, then the preceding is a two-state Markov chain whose transition prob- 
abilities are given by 


a il-a 


eal teen, 


Example 4.2 (A Communications System) Consider a communications system 
that transmits the digits 0 and 1. Each digit transmitted must pass through several 
stages, at each of which there is a probability p that the digit entered will be 
unchanged when it leaves. Letting X,, denote the digit entering the mth stage, then 
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{X,,n = 0,1,...} is a two-state Markov chain having a transition probability 
matrix 


r|i2, 2” ' 
l-p p 


Example 4.3 On any given day Gary is either cheerful (C), so-so (S), or glum 
(G). If he is cheerful today, then he will be C, S, or G tomorrow with respective 
probabilities 0.5, 0.4, 0.1. If he is feeling so-so today, then he will be C, S, or 
G tomorrow with probabilities 0.3, 0.4, 0.3. If he is glum today, then he will be 
C, S, or G tomorrow with probabilities 0.2, 0.3, 0.5. 

Letting X,, denote Gary’s mood on the nth day, then {X,,, 7 > 0} is a three-state 
Markov chain (state 0 = C, state 1 = S, state 2 = G) with transition probability 
matrix 


0.5 0.4 0.1 
P=/0.3 04 0.3 (| 
0.2 03 0.5 


Example 4.4 (Transforming a Process into a Markov Chain) Suppose that whether 
or not it rains today depends on previous weather conditions through the last two 
days. Specifically, suppose that if it has rained for the past two days, then it will 
rain tomorrow with probability 0.7; if it rained today but not yesterday, then 
it will rain tomorrow with probability 0.5; if it rained yesterday but not today, 
then it will rain tomorrow with probability 0.4; if it has not rained in the past 
two days, then it will rain tomorrow with probability 0.2. 

If we let the state at time 7 depend only on whether or not it is raining at time 7, 
then the preceding model is not a Markov chain (why not?). However, we can 
transform this model into a Markov chain by saying that the state at any time 
is determined by the weather conditions during both that day and the previous 
day. In other words, we can say that the process is in 


state 0 if it rained both today and yesterday, 
state 1 if it rained today but not yesterday, 
state2 if it rained yesterday but not today, 

state 3 if it did not rain either yesterday or today. 


The preceding would then represent a four-state Markov chain having a transition 
probability matrix 


0.7 0 0.3 0 
0.5 0 0.5 0 
0 0.4 0 0.6 
0 0.2 0 0.8 


P= 
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You should carefully check the matrix P, and make sure you understand how it 
was obtained. a 


Example 4.5 (A Random Walk Model) A Markov chain whose state space is given 
by the integers i = 0, +1, +2,... is said to be a random walk if, for some number 
0<p<i1, 


Piva =p =1- Piji-1, i= O,41,..45 


The preceding Markov chain is called a random walk for we may think of it as 
being a model for an individual walking on a straight line who at each point of 
time either takes one step to the right with probability p or one step to the left 
with probability 1 — p. a 


Example 4.6 (A Gambling Model) Consider a gambler who, at each play of the 
game, either wins $1 with probability p or loses $1 with probability 1 — p. If we 
suppose that our gambler quits playing either when he goes broke or he attains 
a fortune of $N, then the gambler’s fortune is a Markov chain having transition 
probabilities 


Pii41 =p =1- Pii-1, i= 1,2,...,N—-1, 
Poo = Pyn = 1 


States 0 and N are called absorbing states since once entered they are never left. 
Note that the preceding is a finite state random walk with absorbing barriers 
(states 0 and N). a 


Example 4.7 In most of Europe and Asia annual automobile insurance premi- 
ums are determined by use of a Bonus Malus (Latin for Good-Bad) system. Each 
policyholder is given a positive integer valued state and the annual premium 
is a function of this state (along, of course, with the type of car being insured 
and the level of insurance). A policyholder’s state changes from year to year 
in response to the number of claims made by that policyholder. Because lower 
numbered states correspond to lower annual premiums, a policyholder’s state 
will usually decrease if he or she had no claims in the preceding year, and will 
generally increase if he or she had at least one claim. (Thus, no claims is good 
and typically results in a decreased premium, while claims are bad and typically 
result in a higher premium.) 

For a given Bonus Malus system, let s;(k) denote the next state of a policyholder 
who was in state i in the previous year and who made a total of k claims in that 
year. If we suppose that the number of yearly claims made by a particular policy- 
holder is a Poisson random variable with parameter A, then the successive states 
of this policyholder will constitute a Markov chain with transition probabilities 


le 5 
P= >) e"T, 720 
k:si(k)=j 
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Whereas there are usually many states (20 or so is not atypical), the following 
table specifies a hypothetical Bonus Malus system having four states. 


Next state if 
State Annual Premium 0 claims 1 claim 2 claims > 3 claims 
1 200 1 2 3 4 
2 250 1 3 4 4 
3 400 2 4 4 4 
4 600 3 4 4 4 


Thus, for instance, the table indicates that s2(0)=1; s2(1)=3; s2(k)=4, 
k > 2. Consider a policyholder whose annual number of claims is a Poisson ran- 
dom variable with parameter i. If az is the probability that such a policyholder 
makes k claims in a year, then 


k 
Oe 
RI’ 


For the Bonus Malus system specified in the preceding table, the transition 
probability matrix of the successive states of this policyholder is 


ap=e k>0 


ao a1 a2 1—ag-—a, -—a 
ao O ay 1l-ag-a 

0 ao O 1— ao 

0 0 ao 1-—ao 


4.2 Chapman-Kolmogorov Equations 


We have already defined the one-step transition probabilities P;;, We now define 
the n-step transition probabilities P* to be the probability that a process in state i 
will be in state j after 1 additional transitions. That is, 


Ph = P(Xnip=j\Xp=i}, n> 0, i,j 20 
Of course Pi = Pj. The Chapman-Kolmogorov equations provide a method for 
computing these 7-step transition probabilities. These equations are 


lee) 
pi = > Pi Ph for all n,m > 0, alli,j (4.2) 
k=0 


and are most easily understood by noting that Pi Py represents the probability 
that starting in i the process will go to state j in 7 + m transitions through a 
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path which takes it into state k at the mth transition. Hence, summing over all 
intermediate states k yields the probability that the process will be in state j after 
n+ m transitions. Formally, we have 


PE = Pl Xnem= [Xo =i} 


[o,@) 
=) PiXntm =j,Xn = k|Xo =i} 
k=0 


= > P{Xnsm = f\Xn = ky Xo = i}P{Xn = 1X0 = i} 
k=0 


lee) 
=) mM pn 
= > PuPs 
k=0 


If we let P™ denote the matrix of n-step transition probabilities Pi then Equa- 
tion (4.2) asserts that 
ptm — pm . p™) 


where the dot represents matrix multiplication.* Hence, in particular, 


p® — pd+D _ p.p p2 


and by induction 
p™ = p@-+) = p71 .Pp =p” 


That is, the 7-step transition matrix may be obtained by multiplying the matrix 
P by itself 7 times. 


Example 4.8 Consider Example 4.1 in which the weather is considered as a 
two-state Markov chain. If a = 0.7 and £ = 0.4, then calculate the probability 
that it will rain four days from today given that it is raining today. 


Solution: The one-step transition probability matrix is given by 


0.7 03 
cere ne 


* Tf A is an N x M matrix whose element in the ith row and jth column is aj and B is an M x K 
matrix whose element in the ith row and jth column is bj, then A-B is defined to be the N x K matrix 
whose element in the ith row and jth column is yey dip bpj- 
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Hence, 
p® —p2_ 07 03] 07 0.3 
0.4 0.6 0.4 0.6 
_ 0.61 0.39 
~ 10.52 0.48])’ 
pe — p22 — |9-61 0.39 J0.61 0.39 
~ ~ 10.52 0.48 0.52 0.48 
0.5749 0.4251 
~ 10.5668 0.4332 
and the desired probability Pj, equals 0.5749. a 


Example 4.9 Consider Example 4.4. Given that it rained on Monday and 
Tuesday, what is the probability that it will rain on Thursday? 


Solution: The two-step transition matrix is given by 


0.7. O 0.3 0 0.7 0 0.3 0 
0.5 0 0.5 0 0.5 0 0.5 0 
0 0.4 0 0.6] 0 0.4 0 0.6 
0 0.2 0 0.8] 0 0.2 O 0.8 


p® — p* — 


0.49 0.12 0.21 0.18 
0.35 0.20 0.15 0.30 
0.20 0.12 0.20 0.48 
0.10 0.16 0.10 0.64 


Since rain on Thursday is equivalent to the process being in either state 0 or 
state 1 on Thursday, the desired probability is given by Ree + Pa = 0.49 + 
0.12 = 0.61. | 


Example 4.10 An urn always contains 2 balls. Ball colors are red and blue. At 
each stage a ball is randomly chosen and then replaced by a new ball, which with 
probability 0.8 is the same color, and with probability 0.2 is the opposite color, 
as the ball it replaces. If initially both balls are red, find the probability that the 
fifth ball selected is red. 


Solution: To find the desired probability we first define an appropriate 
Markov chain. This can be accomplished by noting that the probability that a 
selection is red is determined by the composition of the urn at the time of the 
selection. So, let us define X,, to be the number of red balls in the urn after the 
nth selection and subsequent replacement. Then X,,, > 0, is a Markov chain 
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with states 0, 1,2 and with transition probability matrix P given by 


0.8 0.2 O 
0.1 0.8 0.1 
0 02 0.8 


To understand the preceding, consider for instance P19. Now, to go from 1 
red ball in the urn to 0 red balls, the ball chosen must be red (which occurs 
with probability 0.5) and it must then be replaced by a ball of opposite color 
(which occurs with probability 0.2), showing that 


Py.9 = (0.5)(0.2) = 0.1 


To determine the probability that the fifth selection is red, condition on the 
number of red balls in the urn after the fourth selection. This yields 


2 
P(fifth selection is red) = x P(fifth selection is red|X4 = i)P(X4 = i|Xo = 2) 
i=0 


= (0)P3.9 + 0.5)P3, + (P34 


4 4 
=0.5P3, + Ph, 
To calculate the preceding we compute P*. Doing so yields 
P31, = 0.4352, Pj, = 0.4872 


giving the answer P(fifth selection is red) = 0.7048. a 


Example 4.11 Suppose that balls are successively distributed among 8 urns, with 
each ball being equally likely to be put in any of these urns. What is the probability 
that there will be exactly 3 nonempty urns after 9 balls have been distributed? 


Solution: If we let X,, be the number of nonempty urns after 7 balls have 
been distributed, then X,, 2 > 0 is a Markov chain with states 0,1,...,8 and 
transition probabilities 


Peet /8 1 Pig POA 38 


The desired probability is Po3 =P}, where the equality follows because 
Po; =1. Now, starting with 1 occupied urn, if we had wanted to deter- 
mine the entire probability distribution of the number of occupied urns after 
8 additional balls had been distributed we would need to consider the transition 
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probability matrix with states 1,2,...,8. However, because we only require 
the probability, starting with a single occupied urn, that there are 3 occu- 
pied urns after an additional 8 balls have been distributed we can make use 
of the fact that the state of the Markov chain cannot decrease to collapse all 
states 4,5,...,8 into a single state 4 with the interpretation that the state is 
4 whenever four or more of the urns are occupied. Consequently, we need 
only determine the eight-step transition probability PS, of the Markov chain 
with states 1,2, 3,4 having transition probability matrix P given by 


1/8 7/8 0 O 
0 2/8 6/8 0 
O “0. “Se: Sys 
O% 20s a0. af 


Raising the preceding matrix to the power 4 yields the matrix P* given by 


0.0002 0.0256 0.2563 0.7178 


0 0.0039 0.0952 0.9009 

0 0 0.0198 0.9802 

0 0 0 1 
Hence, 


Pi, = 0.0002 x 0.2563 + 0.0256 x 0.0952 + 0.2563 x 0.0198 
+ 0.7178 x 0 = 0.00756 a 


So far, all of the probabilities we have considered are conditional probabilities. 
For instance, P% is the probability that the state at time 7 is j given that the initial 


state at time 0 is 7. If the unconditional distribution of the state at time 7 is 
desired, it is necessary to specify the probability distribution of the initial state. 
Let us denote this by 


[o@) 
c= X= 130 (Sros=1] 
i=0 


All unconditional probabilities may be computed by conditioning on the initial 
state. That is, 


P{Xn = j} = D> P{Xn = i|Xo = i}P{Xo0 =} 
i=0 


[o,e) 
= oP 
i=0 
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For instance, if ag = 0.4, a, = 0.6, in Example 4.8, then the (unconditional) 
probability that it will rain four days after we begin keeping weather records is 


P{X4 = 0} = 0.4P§, + 0.6P 4p 
= (0.4)(0.5749) + (0.6) (0.5668) 
= 0.5700 


Consider a Markov chain with transition probabilities Pj. Let </ be a set of 
states, and suppose we are interested in the probability that the Markov chain 
ever enters any of the states in </ by time m. That is, for a given state i ¢ <, we 
are interested in determining 


B= P(X, € & for some k = 1,...,m|Xo = 1) 


To determine the preceding probability we will define a Markov chain {W,,1 > 
0} whose states are the states that are not in & plus an additional state, which we 
will call A in our general discussion (though in specific examples we will usually 
give it a different name). Once the {W,,} Markov chain enters state A it remains 
there forever. 

The new Markov chain is defined as follows. Letting X, denote the state at 
time 7 of the Markov chain with transition probabilities P;;, define 


N=min{n: X, € DH} 


and let N = oo if X, ¢ & for all n. In words, N is the first time the Markov 
chain enters the set of states .”. Now, define 


_ [Xn ifn<N 
We = {2 ifn>N 


So the state of the {W,,} process is equal to the state of the original Markov 
chain up to the point when the original Markov chain enters a state in &. At 
that time the new process goes to state A and remains there forever. From this 
description it follows that W,,2 > 0 is a Markov chain with states i,i¢ &, A 
and with transition probabilities OQ; ;, given by 


O= Piz, if id Aj e a 
Om = YP af ag at 


jE 
Qaa=1 
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Because the original Markov chain will have entered a state in & by time m if 
and only if the state at time m of the new Markov chain is A, we see that 


P(X, € & for some k= 1,...,m|Xo = 1) 
= P(Win = AlXo = 1) = PCW, = Al Wo = 1) = OF" 


That is, the desired probability is equal to an m-step transition probability of the 
new chain. 


Example 4.12 A pensioner receives 2 (thousand dollars) at the beginning of 
each month. The amount of money he needs to spend during a month is inde- 
pendent of the amount he has and is equal to i with probability P;,i = 1, 2, 3, 4, 
y-+., Pj =1. If the pensioner has more than 3 at the end of a month, he gives 
the amount greater than 3 to his son. If, after receiving his payment at the 
beginning of a month, the pensioner has a capital of 5, what is the proba- 
bility that his capital is ever 1 or less at any time within the following four 
months? 


Solution: To find the desired probability, we consider a Markov chain 
with the state equal to the amount the pensioner has at the end of a 
month. Because we are interested in whether this amount ever falls as low 
as 1, we will let 1 mean that the pensioner’s end-of-month fortune has 
ever been less than or equal to 1. Because the pensioner will give any end- 
of-month amount greater than 3 to his son, we need only consider the 
Markov chain with states 1,2, 3 and transition probability matrix Q = [Q;,] 
given by 


P3+P4 Po Py 
P4 P3 Py +P 


To understand the preceding, consider Q21, the probability that a month that 
ends with the pensioner having the amount 2 will be followed by a month that 
ends with the pensioner having less than or equal to 1. Because the pensioner 
will begin the new month with the amount 2 + 2 = 4, his ending capital will be 
less than or equal to 1 if his expenses are either 3 or 4. Thus, Q21 = P3 + P4. 
The other transition probabilities are similarly explained. 

Suppose now that P; = 1/4, 7 = 1,2,3,4. The transition probability 
matrix is 


1 Qe. 20 
1/2 1/4 1/4 
1/4 1/4 1/2 
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Squaring this matrix and then squaring the result gives the matrix 


1 0 O 

mo: “ag, <O 
256 256 256 
201 21 34 


Because the pensioner’s initial end-of-month capital was 3, the desired answer 
is OF = 201/256. | 


Suppose now that we want to compute the probability that the {X,,” > 0} 
chain, starting in state i, enters state j at time m without ever entering any of the 
states in ./, where neither i nor ; is in .</. That is, for i,j ¢ .&, we are interested 
in 

a = P(Xm =j,X,¢0,k=1,...,m—1|Xo =i) 


Noting that the event that X,, = j, X, ¢ Zk =1,...,m—1 is equivalent to the 
event that W,, =, it follows that for i,j ¢ 9, 


= P(Wm = j|Wo =i) =O". 


For instance, in Example 4.12, starting with 5 at the beginning of January, the 
probability that the pensioner’s capital is 4 at the beginning of May without ever 
having been less than or equal to 1 in that time is O35 = 21/256. 


Example 4.13 Consider a Markov chain with states 1,2,3,4,5, and suppose 
that we want to compute 


P(X4 = 2,X3 <2, X2 <2, X1 < 2|X0 = 1) 


That is, we want the probability that, starting in state 1, the chain is in state 2 at 
time 4 and has never entered any of the states in the set ./ = {3,4, 5}. 

To compute this probability all we need to know are the transition probabilities 
P11, P12, P21, P22. So, suppose that 


P41 =0.3) Py2 = 0.3 
Po, =0.1 Po = 0.2 


Then we consider the Markov chain having states 1,2, 3 (we are giving state A 
the name 3), and having the transition probability matrix Q as follows: 


0.3 0.3 0.4 
0.1 0.2 0.7 
0 0 1 
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The desired probability is Q7,. Raising Q to the power 4 yields the matrix 


0.0219 0.0285 0.9496 
0.0095 0.0124 0.9781 
0 0 1 
Hence, the desired probability is a = 0.0285. a 
When i ¢ & but j € & we can determine the probability 


a= P(X =), Xp ¢G,R=1,...,m—1|Xo =i) 
as follows. 
a=) PXm = j,Xm1 = 1, Xp ¢ Hk =1,...,.m—2|Xo =H 
rE 
a SE PCH lk =1,X, ¢A%,k=1,...,m—2,X9 =1) 
rE 
x P(Xn-1 =17,Xp ¢A,k= 1,...,m—2|Xo =21) 


= >> Pj P(Xm—1 = 1, X_ ¢ Wik =1,...,m—2|Xo =i) 
rE 


= DOP OP 


rea 
Also, when i € & we could determine 
CE PO Hj REE EK ATR SD 
by conditioning on the first transition to obtain 


a= )) P(Xm=j,Xp¢ V,R=1,....m—1|Xo =i, X1 = P(X = 11X0 = 1) 
réA 


= 5 P(Xm1 =), Xp ¢ 0k =1,...,.m—2|Xo =MPiy 
réAh 


For instance, if i € <&,j ¢ &@ then the preceding equation yields 


P(Xm = j, Xp ¢ @,k=1,...,.m—-1|Xo =) = DOM TPiy 
rE 


We can also compute the conditional probability of X, given that the chain 
starts in state i and has not entered any state in .&/ by time n, as follows. 
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For i,j ¢ &, 
P(X, = j|Xo =i, X, ¢ M,k=1,...,n} 
_ PiXn =i, Xe¢ GZ R=1,...,n1X0= Qi 
P{X, ¢ Hk =1,...,n|Xo = 1} Nae OF 


4.3 Classification of States 


State j is said to be accessible from state i if P” > 0 for some 1 > 0. Note that this 


implies that state j is accessible from state i if and only if, starting in i, it is possible 
that the process will ever enter state j. This is true since if j is not accessible from 
i, then 


(oe) 
P{ever enter j|start in i} = P| Lx = ii |Xo = i 
n=0 


[o,@) 
< Do P{Xn = j|Xo = i} 
n=0 


Two states i and j that are accessible to each other are said to communicate, and 
we write 7 <> /. 
Note that any state communicates with itself since, by definition, 


P? = P{Xp = i|Xo =i} = 1 
The relation of communication satisfies the following three properties: 


(i) State i communicates with state i, all i > 0. 
(ii) If state i communicates with state j, then state 7 communicates with state 7. 
(iii) If state i communicates with state j, and state ; communicates with state k, then state 
i communicates with state k. 


Properties (i) and (ii) follow immediately from the definition of communication. 
To prove (iii) suppose that i communicates with j, and j communicates with k. 
Thus, there exist integers 2 and m such that Pi > 0, Pr > 0. Now by the 


Chapman-Kolmogorov equations, we have 


lee) 
A+M __ nm pm mpm 
Pi, a me Py ik > O 
r=0 
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Hence, state k is accessible from state i. Similarly, we can show that state 7 is 
accessible from state k. Hence, states i and k communicate. 

Two states that communicate are said to be in the same class. It is an easy 
consequence of (i), (ii), and (iii) that any two classes of states are either identical 
or disjoint. In other words, the concept of communication divides the state space 
up into a number of separate classes. The Markov chain is said to be irreducible 
if there is only one class, that is, if all states communicate with each other. 


Example 4.14 Consider the Markov chain consisting of the three states 0, 1, 2 
and having transition probability matrix 


oOo NR NR 
WR AB NIB 
WIN BIE © 


It is easy to verify that this Markov chain is irreducible. For example, it is possible 
to go from state 0 to state 2 since 


0-1-2 


That is, one way of getting from state 0 to state 2 is to go from state 0 to state 1 
(with probability 5) and then go from state 1 to state 2 (with probability i): a 


Example 4.15 Consider a Markov chain consisting of the four states 0, 1, 2, 3 
and having transition probability matrix 


ix] 
CO BR NR NR 
Oo RR NIP NB 
eS AR CO © 


oh CO CO 


The classes of this Markov chain are {0, 1}, {2}, and {3}. Note that while state 0 (or 
1) is accessible from state 2, the reverse is not true. Since state 3 is an absorbing 
state, that is, P33 = 1, no other state is accessible from it. | 


For any state i we let f; denote the probability that, starting in state i, the process 
will ever reenter state i. State i is said to be recurrent if f; = 1 and transient if 
fi <1. 

Suppose that the process starts in state i and i is recurrent. Hence, with prob- 
ability 1, the process will eventually reenter state i. However, by the definition 
of a Markov chain, it follows that the process will be starting over again when it 
reenters state i and, therefore, state i will eventually be visited again. Continual 
repetition of this argument leads to the conclusion that if state i is recurrent then, 
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starting in state i, the process will reenter state i again and again and again—in 
fact, infinitely often. 

On the other hand, suppose that state i is transient. Hence, each time the 
process enters state i there will be a positive probability, namely, 1 — fj, that it 
will never again enter that state. Therefore, starting in state i, the probability that 
the process will be in state i for exactly 1 time periods equals f/"" (1 — fj), 1 > 1. 
In other words, if state i is transient then, starting in state 1, the number of time 
periods that the process will be in state i has a geometric distribution with finite 
mean 1/(1 — fi). 

From the preceding two paragraphs, it follows that state i is recurrent if and 
only if, starting in state i, the expected number of time periods that the process 
is in state i is infinite. But, letting 


ee fe rae 
"10, ifX,4i 


we have that }°°° 9 I, represents the number of periods that the process is in 
state i. Also, 


ioe) ioe) 
E bp I1Xo = | = 0 ElIn|Xo = i] 
n=0 n=0 


CO 
=) > P{Xn = i|Xo =] 
n=0 


We have thus proven the following. 


Proposition 4.1 State i is 


lee) 
recurrent if y P06; 


n=1 


[o@) 
transient if > Pu <0o 


n=1 


The argument leading to the preceding proposition is doubly important because 
it also shows that a transient state will only be visited a finite number of times 
(hence the name transient). This leads to the conclusion that in a finite-state 
Markov chain not all states can be transient. To see this, suppose the states are 
0,1,...,M and suppose that they are all transient. Then after a finite amount of 
time (say, after time To) state 0 will never be visited, and after a time (say, T1) 
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state 1 will never be visited, and after a time (say, Tz) state 2 will never be visited, 
and so on. Thus, after a finite time T = max{To, T1,..., Ty} no states will be 
visited. But as the process must be in some state after time T we arrive at a 
contradiction, which shows that at least one of the states must be recurrent. 

Another use of Proposition 4.1 is that it enables us to show that recurrence is 
a class property. 


Corollary 4.2 If state 7 is recurrent, and state i communicates with state /, then 
state j is recurrent. 


Proof. To prove this we first note that, since state i communicates with state /, 
there exist integers k and m such that Pk > 0, Pi? > 0. Now, for any integer 1 


m-+n+k m pn pk 
Pi 2 PEP Pi 


This follows since the left side of the preceding is the probability of going from 
jtojinm+n-+k steps, while the right side is the probability of going from j to j 
inm-+n-+k steps via a path that goes from j to i in m steps, then from i to 7 in 
an additional steps, then from i to j in an additional k steps. 

From the preceding we obtain, by summing over 1, that 


eo) eo) 
m-+n+k m pk nN 

y i > Ne bis y Pi, =O 

n=1 n=1 


since PP; > 0 and )7,,_ P7 is infinite since state i is recurrent. Thus, by Propo- 
sition 4.1 it follows that state j is also recurrent. a 


Remarks 


(i) Corollary 4.2 also implies that transience is a class property. For if state i is transient 
and communicates with state j, then state ; must also be transient. For if ; were 
recurrent then, by Corollary 4.2, i would also be recurrent and hence could not be 
transient. 

(ii) Corollary 4.2 along with our previous result that not all states in a finite Markov 
chain can be transient leads to the conclusion that all states of a finite irreducible 
Markov chain are recurrent. 


Example 4.16 Let the Markov chain consisting of the states 0,1,2,3 have the 
transition probability matrix 


0 0 4 § 
p-|!1 0 0 0 
Ore de 50) 250 
0 1 0 0 


Determine which states are transient and which are recurrent. 
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Solution: It is a simple matter to check that all states communicate and, hence, 
since this is a finite chain, all states must be recurrent. | 


Example 4.17 Consider the Markov chain having states 0, 1, 2, 3, 4 and 


oNFNF CO © 


a} 
ll 

AR GO CO NMRENB 

Ar CO CO NMRNBE 

SO NRFNIF OG © 

Ne OO CO OG 


Determine the recurrent state. 


Solution: This chain consists of the three classes {0,1}, {2,3}, and {4}. The 
first two classes are recurrent and the third transient. | 


Example 4.18 (A Random Walk) Consider a Markov chain whose state space 
consists of the integers i = 0,+1,+2,..., and has transition probabilities 
given by 


Pii41 =p=1-—Pij1, 1=0,+1,+42,... 


where 0 < p < 1. In other words, on each transition the process either moves one 
step to the right (with probability p) or one step to the left (with probability 1—p). 
One colorful interpretation of this process is that it represents the wanderings of 
a drunken man as he walks along a straight line. Another is that it represents the 
winnings of a gambler who on each play of the game either wins or loses one 
dollar. 

Since all states clearly communicate, it follows from Corollary 4.2 that they 
are either all transient or all recurrent. So let us consider state 0 and attempt to 
determine if }°°° , Pj, is finite or infinite. 

Since it is impossible to be even (using the gambling model interpretation) after 
an odd number of plays we must, of course, have that 


2n-1 
prl_o0, n=1,2,... 


On the other hand, we would be even after 27 trials if and only if we won n 
of these and lost of these. Because each play of the game results in a win with 
probability p and a loss with probability 1 — p, the desired probability is thus the 
binomial probability 


2 2n)! 
P= (Oona p= Soul (p(1—p))", n=1,2,3,... 


nin! 
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By using an approximation, due to Stirling, which asserts that 
n~ nthe" /I7n (4.3) 
where we say that ay, ~ b, when limy-s 99 an/by = 1, we obtain 


(4p(1 — p))” 
Van 


Now it is easy to verify, for positive ay, by, that if dy ~ bn, then >, dn < 00 if 
and only if )>,, bn < oo. Hence, 77°, P§y will converge if and only if 


2n 
Poo 


ee) 


s (4p — p))”" 
n=1 van 


does. However, 4p(1 — p) < 1 with equality holding if and only if p = 5. Hence, 
1 Pho = © if and only if p = 5. Thus, the chain is recurrent when p = 5 and 
transient if p 4 7 

When p = 5, the preceding process is called a symmetric random walk. We 
could also look at symmetric random walks in more than one dimension. For 
instance, in the two-dimensional symmetric random walk the process would, at 
each transition, either take one step to the left, right, up, or down, each having 
probability i That is, the state is the pair of integers (i,j) and the transition 
probabilities are given by 


1 
Pap dt) = Paj,d-1) = Pap,Git) = PadGi-D = 4 


By using the same method as in the one-dimensional case, we now show that this 
Markov chain is also recurrent. 

Since the preceding chain is irreducible, it follows that all states will be recurrent 
if state 0 = (0,0) is recurrent. So consider Pans Now after 27 steps, the chain 
will be back in its original location if for some i,0 < i < n, the 2” steps consist 
of i steps to the left, i to the right,  — i up, and m — i down. Since each step 
will be either of these four types with probability a it follows that the desired 
probability is a multinomial probability. That is, 


pee (2n)! 1" 
Too 2 ilil(n — i)\(n — i)! (3) 


i=0 


enya n! 157" 
~ £4 ln! (n— i)li! —al3) 
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fl 2" 12% Se n n 
~A\4 n} =VNi}\n-i 
1\7" (2n\ (2n 
la) eG) a 
where the last equality uses the combinatorial identity 


which follows upon noting that both sides represent the number of subgroups of 
size n one can select from a set of 7 white and 1 black objects. Now, 


2n (2n)! 
( n ) ~ nln! 
(2n)2"t1/2 9-20, /Iz 
nentl e72n (27) 
4” 
~ an 


Hence, from Equation (4.4) we see that 


by Stirling’s approximation 


pat ~ om 
00 
mM 


which shows that )7,,P3% = oo, and thus all states are recurrent. 

Interestingly enough, whereas the symmetric random walks in one and two 
dimensions are both recurrent, all higher-dimensional symmetric random walks 
turn out to be transient. (For instance, the three-dimensional symmetric random 


walk is at each transition equally likely to move in any of six ways—either to the 
left, right, up, down, in, or out.) | 


Remark For the one-dimensional random walk of Example 4.18 here is a direct 
argument for establishing recurrence in the symmetric case, and for determining 
the probability that it ever returns to state 0 in the nonsymmetric case. Let 


fB = Pf{ever return to 0} 


To determine £, start by conditioning on the initial transition to obtain 


6 = Pfever return to 0|X; = 1}p + Pfever return to 0|X; = —1}(1 — p) 
(4.5) 
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Now, let a denote the probability that the Markov chain will ever return to state 0 
given that it is currently in state 1. Because the Markov chain will always increase 
by 1 with probability p or decrease by 1 with probability 1—p no matter what its 
current state, note that a is also the probability that the Markov chain currently in 
state i will ever enter state i— 1, for any 7. To obtain an equation for a, condition 
on the next transition to obtain 


a = Pfever return|X1 = 1, X2 = 0}(1 — p) + P{ever return|X; = 1, X2 = 2}p 
= 1-—p+ Pfever return|X; = 1, X2 = 2}p 
=1-p+pa 


where the final equation follows by noting that in order for the chain to ever go 
from state 2 to state 0 it must first go to state 1—and the probability of that ever 
happening is a—and if it does eventually go to state 1 then it must still go to state 
0—and the conditional probability of that ever happening is also «. Therefore, 


a=1—-p+ pa 


The two roots of this equation are a = 1 anda = (1—p)/p. Consequently, in the 
case of the symmetric random walk where p = 1/2 we can conclude that w = 1. 
By symmetry, the probability that the symmetric random walk will ever enter 
state O given that it is currently in state —1 is also 1, proving that the symmetric 
random walk is recurrent. 

Suppose now that p > 1/2. In this case, it can be shown (see Exercise 17 at 
the end of this chapter) that P{ever return to 0|X; = —1} = 1. Consequently, 
Equation (4.5) reduces to 


B=ap+1-p 


Because the random walk is transient in this case we know that £ < 1, showing 
that a 4 1. Therefore, w = (1 — p)/p, yielding that 


B=21-p), p>1/2 
Similarly, when p < 1/2 we can show that 6 = 2p. Thus, in general 

P{ever return to 0} = 2 min(p, 1 — p) | 
Example 4.19 (On the Ultimate Instability of the Aloha Protocol) Consider a 
communications facility in which the numbers of messages arriving during each of 
the time periods n = 1,2,... are independent and identically distributed random 


variables. Let a; = P{i arrivals}, and suppose that a9 + a, < 1. Each arriving 
message will transmit at the end of the period in which it arrives. If exactly 
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one message is transmitted, then the transmission is successful and the message 
leaves the system. However, if at any time two or more messages simultaneously 
transmit, then a collision is deemed to occur and these messages remain in the 
system. Once a message is involved in a collision it will, independently of all 
else, transmit at the end of each additional period with probability p—the so- 
called Aloha protocol (because it was first instituted at the University of Hawaii). 
We will show that such a system is asymptotically unstable in the sense that the 
number of successful transmissions will, with probability 1, be finite. 

To begin let X,, denote the number of messages in the facility at the beginning 
of the mth period, and note that {X,,, > 0} is a Markov chain. Now for k > 0 
define the indicator variables I, by 


1, if the first time that the chain departs state k it 
Ip = directly goes to state k — 1 
0, otherwise 


and let it be 0 if the system is never in state k, k > 0. (For instance, if the successive 
states are 0,1,3,4,..., then Iz = 0 since when the chain first departs state 3 it 
goes to state 4; whereas, if they are 0, 3,3,2,..., then Iz = 1 since this time it 
goes to state 2.) Now, 


Cc (oe) 
E| > ip} =>) Fue 
k=0 k=0 


= SO Pie = 1} 
k=0 


[o,@) 
< Dy P{I, = 1k is ever visited} (4.6) 
k=0 


Now, P{I, = 1|k is ever visited} is the probability that when state k is departed 
the next state is k — 1. That is, it is the conditional probability that a transition 
from k is to k — 1 given that it is not back into k, and so 


Ppp 

P{I, = 1k is ever visited} = —“— 
1 = Pye 
Because 


Ppp = aokp( — pr 
Ppp = all — kp — p)*-1] + a1 — p)* 


4.3 Classification of States 213 


which is seen by noting that if there are k messages present on the beginning of 
a day, then (a) there will be k — 1 at the beginning of the next day if there are 
no new messages that day and exactly one of the k messages transmits; and (b) 
there will be k at the beginning of the next day if either 


(i) there are no new messages and it is not the case that exactly one of the existing k 
messages transmits, or 

(ii) there is exactly one new message (which automatically transmits) and none of the 
other k messages transmits. 


Substitution of the preceding into Equation (4.6) yields 


~ . aokp( — p)*! 

E|Y i | < 

» | Xu 1 —aol1 — kp — p)*-"] — a1 — pk 
< ©} 


where the convergence follows by noting that when k is large the denominator of 
the expression in the preceding sum converges to 1 — ap and so the convergence 
or divergence of the sum is determined by whether or not the sum of the terms 
in the numerator converge and )°7°.) k(1 — p)k-! < ©. 

Hence, E[-¢2.9 Ip] < 00, which implies that )°7° 9 I, < co with probability 1 
(for if there was a positive probability that )°?° 9 Iz could be oo, then its mean 
would be oo). Hence, with probability 1, there will be only a finite number of 
states that are initially departed via a successful transmission; or equivalently, 
there will be some finite integer N such that whenever there are N or more 
messages in the system, there will never again be a successful transmission. From 
this (and the fact that such higher states will eventually be reached—why?) it 
follows that, with probability 1, there will only be a finite number of successful 
transmissions. | 


Remark For a (slightly less than rigorous) probabilistic proof of Stirling’s 
approximation, let X;X2,... be independent Poisson random variables each hav- 
ing mean 1. Let S, = }¢7_, Xj, and note that both the mean and variance of 
S, are equal to n. Now, 


P{S, =n} = P{n-1 <S, <n} 
= P{-1//n < (S, —n)//n < 0} 


. (2n)-1/29-?/2 Pe when 7 is large, by the 
Ain central limit theorem 


yR 


x (2a)? (1 //n) 
= (2xn)~ 1/2 
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But S, is Poisson with mean n, and so 


e” n 


P{S, =n} = 


Hence, for 7 large 


NN 


x (2xn)~ 1/2 
n! 


or, equivalently 
nl nto In 


which is Stirling’s approximation. 


4.4 Limiting Probabilities 


In Example 4.8, we calculated P™ for a two-state Markov chain; it turned out 
to be 


pit) _ |9-5749 0.4251 
~ 10.5668 0.4332 


From this it follows that P) = P® . P®) is given (to three significant places) by 


pi) _ 9-572 0.428 
~ 10.570 0.430 


Note that the matrix P“) is almost identical to the matrix P™, and secondly, that 
each of the rows of P®) has almost identical entries. In fact it seems that Pi is 


converging to some value (as 7 — oo) that is the same for all i. In other words, 
there seems to exist a limiting probability that the process will be in state j after 
a large number of transitions, and this value is independent of the initial state. 

To make the preceding heuristics more precise, two additional properties of 
the states of a Markov chain need to be considered. State i is said to have period 
d if P“ = 0 whenever 7 is not divisible by d, and d is the largest integer with this 
property. For instance, starting in i, it may be possible for the process to enter 
state i only at the times 2,4, 6, 8,..., in which case state i has period 2. A state 
with period 1 is said to be aperiodic. It can be shown that periodicity is a class 
property. That is, if state i has period d, and states i and j communicate, then 
state j also has period d. 
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If state 7 is recurrent, then it is said to be positive recurrent if, starting in {, 
the expected time until the process returns to state i is finite. It can be shown 
that positive recurrence is a class property. While there exist recurrent states that 
are not positive recurrent,* it can be shown that in a finite-state Markov chain 
all recurrent states are positive recurrent. Positive recurrent, aperiodic states are 
called ergodic. 

We are now ready for the following important theorem, which we state without 
proof. 


Theorem 4.1 For an irreducible ergodic Markov chain limy-. 0 Pi exists and is 


independent of i. Furthermore, letting 


m= lim Pi, j 20 
n—-> Co 


then z; is the unique nonnegative solution of 


[o,@) 
m= > mee GSU, 
i=0 


Remarks 


(i) Given that 2; = limy—+oo Pi exists and is independent of the initial state i, it is not 
difficult to (heuristically) see that the z’s must satisfy Equation (4.7). Let us derive 
an expression for P{X,,41 =} by conditioning on the state at time . That is, 


CO 
P{Xni1 =f} =D PlXng1 = i|Xn = PIXn = A} 
i=0 


=P Pix =a) 


i=0 


Letting 7 — oo, and assuming that we can bring the limit inside the summation, 
leads to 


CO 
T= Sra 
i=0 


(ii) It can be shown that z;, the limiting probability that the process will be in state j at 
time 7, also equals the long-run proportion of time that the process will be in state /. 


* Such states are called null recurrent. 
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(iii) If the Markov chain is irreducible, then there will be a solution to 


if and only if the Markov chain is positive recurrent. If a solution exists then it will 
be unique, and x; will equal the long-run proportion of time that the Markov chain 
is in state j. If the chain is aperiodic, then 7; is also the limiting probability that the 
chain is in state j. 


Example 4.20 Consider Example 4.1, in which we assume that if it rains today, 
then it will rain tomorrow with probability a; and if it does not rain today, then 
it will rain tomorrow with probability B. If we say that the state is 0 when it rains 
and 1 when it does not rain, then by Equation (4.7) the limiting probabilities zo 
and 7 are given by 


™ =amg + Br, 
mm =(1—a)mo9 + (1 - £)m, 
mo +71 =1 
which yields that 


B 1l-a 


ROS gee  eaegetg 


For example if a = 0.7 and B = 0.4, then the limiting probability of rain is 
my = 4 = 0.571. a 


Example 4.21 Consider Example 4.3 in which the mood of an individual is 
considered as a three-state Markov chain having a transition probability matrix 


0.5 04 01 
P=|0.3 04 0.3 
0.2 03 0.5 


In the long run, what proportion of time is the process in each of the three states? 


Solution: The limiting probabilities z;,i = 0, 1,2, are obtained by solving the 
set of equations in Equation (4.1). In this case these equations are 


mo = 0.529 + 0.324 + 0.272, 

a, = 0.479 + 0.4271 + 0.372, 

m2 = 0.19 + 0.304 + 0.572, 
tmtm+m=1 
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Solving yields 


— 21 23 — 18 P] 


Example 4.22 (A Model of Class Mobility) A problem of interest to sociologists 
is to determine the proportion of society that has an upper- or lower-class occu- 
pation. One possible mathematical model would be to assume that transitions 
between social classes of the successive generations in a family can be regarded 
as transitions of a Markov chain. That is, we assume that the occupation of a 
child depends only on his or her parent’s occupation. Let us suppose that such a 
model is appropriate and that the transition probability matrix is given by 


0.45 0.48 0.07 
P=]0.05 0.70 0.25 (4.8) 
0.01 0.50 0.49 


That is, for instance, we suppose that the child of a middle-class worker will 
attain an upper-, middle-, or lower-class occupation with respective probabilities 
0.05, 0.70, 0.25. 

The limiting probabilities 2; thus satisfy 


mo = 0.4579 + 0.0571 + 0.0179, 

m1, = 0.4829 + 0.70721 + 0.5072, 

m2 = 0.0729 + 0.2571 + 0.4972, 
To +tm+m=1 


Hence, 
wo =0.07, m1 =0.62, m2 =0.31 


In other words, a society in which social mobility between classes can be described 
by a Markov chain with transition probability matrix given by Equation (4.8) 
has, in the long run, 7 percent of its people in upper-class jobs, 62 percent of its 
people in middle-class jobs, and 31 percent in lower-class jobs. a 


Example 4.23 (The Hardy-Weinberg Law and a Markov Chain in Genetics) Con- 
sider a large population of individuals, each of whom possesses a particular pair 
of genes, of which each individual gene is classified as being of type A or type 
a. Assume that the proportions of individuals whose gene pairs are AA, aa, or 
Aa are, respectively, po, go, and ro (po + go + ro = 1). When two individuals 
mate, each contributes one of his or her genes, chosen at random, to the resultant 
offspring. Assuming that the mating occurs at random, in that each individual is 
equally likely to mate with any other individual, we are interested in determining 
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the proportions of individuals in the next generation whose genes are AA, aa, 
or Aa. Calling these proportions p, g, and r, they are easily obtained by focus- 
ing attention on an individual of the next generation and then determining the 
probabilities for the gene pair of that individual. 

To begin, note that randomly choosing a parent and then randomly choos- 
ing one of its genes is equivalent to just randomly choosing a gene from the total 
gene population. By conditioning on the gene pair of the parent, we see that a 
randomly chosen gene will be type A with probability 


P{A} = P{A|AA}po + P{Alaa}qo + P{A|Aa}ro 
= po + 10/2 


Similarly, it will be type a with probability 
P{a} = qo + r0/2 


Thus, under random mating a randomly chosen member of the next generation 
will be type AA with probability p, where 


p = P{A}P{A} = (po + 10/2) 

Similarly, the randomly chosen member will be type aa with probability 
q = Pla}P{a} = (qo + 70/2) 

and will be type Aa with probability 
r = 2P{A}P{a} = 2(P0 + 70/2) (Go + 10/2) 


Since each member of the next generation will independently be of each of the 
three gene types with probabilities p, q, r, it follows that the percentages of the 
members of the next generation that are of type AA, aa, or Aa are respectively p, 
q, and r. 

If we now consider the total gene pool of this next generation, then p + 1/2, the 
fraction of its genes that are A, will be unchanged from the previous generation. 
This follows either by arguing that the total gene pool has not changed from 
generation to generation or by the following simple algebra: 


p +1/2 = (po + 10/2)” + (bo + 10/2)(qo + 70/2) 
= (po + 0/2) [po + 10/2 + 90 + 70/2] 
=po+ro/2 sincepo+r+g0=1 
= P{A} (4.9) 
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Thus, the fractions of the gene pool that are A and a are the same as in the initial 
generation. From this it follows that, under random mating, in all successive 
generations after the initial one the percentages of the population having gene 
pairs AA, aa, and Aa will remain fixed at the values p, g, and r. This is known 
as the Hardy-Weinberg law. 

Suppose now that the gene pair population has stabilized in the percentages p, 
q, 7, and let us follow the genetic history of a single individual and her descendants. 
(For simplicity, assume that each individual has exactly one offspring.) So, for 
a given individual, let X,, denote the genetic state of her descendant in the nth 
generation. The transition probability matrix of this Markov chain, namely, 


AA aa Aa 

rT r 

AA a : 

pts 0 qt; 

da 0 pce ae 

a a a) 
ee ae ee es ee a: 

A 

fa aca ae a 


is easily verified by conditioning on the state of the randomly chosen mate. It 
is quite intuitive (why?) that the limiting probabilities for this Markov chain 
(which also equal the fractions of the individual’s descendants that are in each of 
the three genetic states) should just be p, g, and r. To verify this we must show 
that they satisfy Equation (4.7). Because one of the equations in Equation (4.7) 
is redundant, it suffices to show that 


=o(p4 0) 4e(- 44 )= rad 
PPPs Not Gyles) > 
=q(q+~)+7(2+-)= rea 
g=qlar Stars pa (ar) 


p+qtr=1 


But this follows from Equation (4.9), and thus the result is established. | 


Example 4.24 Suppose that a production process changes states in accordance 
with an irreducible, positive recurrent Markov chain having transition proba- 
bilities P;;, i,j = 1,...,, and suppose that certain of the states are considered 
acceptable and the remaining unacceptable. Let A denote the acceptable states 
and A¢ the unacceptable ones. If the production process is said to be “up” when 
in an acceptable state and “down” when in an unacceptable state, determine 


1. the rate at which the production process goes from up to down (that is, the rate of 
breakdowns); 
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2. 
3. 


the average length of time the process remains down when it goes down; and 
the average length of time the process remains up when it goes up. 


Solution: Let m,,k = 1,...,, denote the long-run proportions. Now for 
i¢ Aandj € A‘ the rate at which the process enters state j from state 7 is 


rate enter j from i = 7;P; 


and so the rate at which the production process enters state j from an acceptable 
state is 


rate enter j from A = a mjP i 
icA 
Hence, the rate at which it enters an unacceptable state from an acceptable one 
(which is the rate at which breakdowns occur) is 


rate breakdowns occur = ee miPi (4.10) 
JEAS icA 


Now let U and D denote the average time the process remains up when it 
goes up and down when it goes down. Because there is a single breakdown 
every U + D time units on the average, it follows heuristically that 

; 1 
rate at which breakdowns occur = =—= 
U+D 


and so from Equation (4.10), 
1 
= aah (4.11) 
U+D : 


To obtain a second equation relating U and D, consider the percentage of 
time the process is up, which, of course, is equal to ));.4 i. However, since the 
process is up on the average U out of every U + D time units, it follows (again 
somewhat heuristically) that the 


U 
proportion of up time = =—— 
U+D 
and so 
U 
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Hence, from Equations (4.11) and (4.12) we obtain 
ae Nj 
ye he Vie MiP ij 
D LS pe Ni 
Deke Vics MiP ij 
ied Tj 
yea Vics MiP ij 


For example, suppose the transition probability matrix is 


U= 


AIR AIR OO AR 
AlR BIR AR AIR 
CO ABNF NE 
NIF AIR AIR © 


where the acceptable (up) states are 1, 2 and the unacceptable (down) ones are 
3, 4. The limiting probabilities satisfy 


1 1 1 
U1 = M14 + 3G + 4G, 


1 1 1 1 
W2=M14 + 02g + 134g + '14q;, 


1 1 1 
m3 = 115 + 125 + 1345 


m+m+m73+74=1 
These solve to yield 
3 1 14 13 
U1 = T6> 2 = G> 3 = 4Q> U4 = ag 


and thus 


rate of breakdowns = 74 (P13 + P44) + m2(P23 + Poa) 


ae 
= 32> 


Uu=% and D=2 


Hence, on the average, breakdowns occur about a (or 28 percent) of the time. 
They last, on the average, 2 time units, and then there follows a stretch of (on 
the average) i time units when the system is up. a 
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Remark The long run proportions z;,j > 0, are often called stationary proba- 
bilities. The reason being that if the initial state is chosen according to the proba- 
bilities 7;,7 > 0, then the probability of being in state j at any time 7 is also equal 
to z;. That is, if 


P{Xp =jpJ=m, 720 
then 
P{X, =j}=n; foralln, 7 >0 


The preceding is easily proven by induction, for if we suppose it true for n — 1, 
then writing 


PIX, =f) =) Pika Xni =) Pah} 


L 


= os Pi; by the induction hypothesis 


L 


= j by Equation (4.7) 


Example 4.25 Suppose the numbers of families that check into a hotel on succes- 
sive days are independent Poisson random variables with mean A. Also suppose 
that the number of days that a family stays in the hotel is a geometric random 
variable with parameter p,0 < p < 1. (Thus, a family who spent the previous 
night in the hotel will, independently of how long they have already spent in the 
hotel, check out the next day with probability p.) Also suppose that all families 
act independently of each other. Under these conditions it is easy to see that if 
X, denotes the number of families that are checked in the hotel at the beginning 
of day n then {X,, 7 > 0} is a Markov chain. Find 


(a) the transition probabilities of this Markov chain; 


a 
(b) E[Xn|Xo0 = 4]; 
(c) the stationary probabilities of this Markov chain. 


Solution: (a) To find P;,;, suppose there are i families checked into the hotel 
at the beginning of a day. Because each of these i families will stay for another 
day with probability g = 1 — p it follows that R;, the number of these families 
that remain another day, is a binomial (i, g) random variable. So, letting N be 
the number of new families that check in that day, we see that 


Pig = P(Ri + N =f) 
Conditioning on R; and using that N is Poisson with mean A, we obtain 


Pi =) P(RJ+N={IRi = w(,)ator 
k=0 
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=) PN =j-kIR = w(,)a°e 
k=0 


min(i,/) 


2X PIN =i-B(; ae ie 


ae i_k : 
-,_M (1) ee 
OG aeEy ee 


(b) Using the preceding representation R; + N for the next state from state i, 
we see that 


E[Xn|Xn—-1 =1)= E[R;+ N] = iq +A 


k=0 


Consequently, 
E[X;,|Xn—1] = Xn-19 + A 
Taking expectations of both sides yields 
E[Xn] =A + gE[Xn-1] 
Iterating the preceding gives 
E[Xn] = A + gE[Xn-1] 
=A + qa + gE[Xy-2]) 
=A+ qh 4+ PEXn-2] 
=A+ qa+ qa + gE[Xn-3]) 
=A+ qA+ Qt PEXn-3] 
showing that 


E[X,] =2 (1 t+qt+qet...+ q*") + q’E[Xo] 


and yielding the result 
ACL — q") + gti 
Pp 


(c) To find the stationary probabilities we will not directly use the complicated 
transition probabilities derived in part (a). Rather we will make use of the fact 
that the stationary probability distribution is the only distribution on the initial 
state that results in the next state having the same distribution. Now, suppose 
that the initial state Xo has a Poisson distribution with mean a. That is, assume 
that the number of families initially in the hotel is Poisson with mean a. Let R 
denote the number of these families that remain in the hotel at the beginning of 
the next day. Then, using the result of Example 3.23 that if each of a Poisson 


E[Xn|Xo0 = 1] = 
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distributed (with mean a) number of events occurs with probability q, then 
the total number of these events that occur is Poisson distributed with mean 
aq, it follows that R is a Poisson random variable with mean aq. In addition, 
the number of new families that check in during the day, call it N, is Poisson 
with mean A, and is independent of R. Hence, since the sum of independent 
Poisson random variables is also Poisson distributed, it follows that R + N, 
the number of guests at the beginning of the next day, is Poisson with mean 
A + aq. Consequently, if we choose a so that 


a=A+aq 


then the distribution of X; would be the same as that of Xo. But this means 
that when the initial distribution of Xo is Poisson with mean a = a then so 
is the distribution of X1, implying that this is the stationary distribution. That 
is, the stationary probabilities are 


m=e/P(A/py/il, i> 


The preceding model has an important generalization. Namely, consider an 
organization whose workers are of r distinct types. For instance, the organiza- 
tion could be a law firm and its lawyers could either be juniors, associates, or 
partners. Suppose that a worker who is currently type i will in the next period 
become type j with probability g;; for j = 1,...,7 or will leave the organi- 
zation with probability 1 — )°-_; qij. In addition, suppose that new workers 
are hired each period, and that the numbers of types 1,...,7 workers hired 
are independent Poisson random variables with means A1,...,A,;. If we let 
Xn = (X,(1),.-.-5Xn(r)), where X,,() is the number of type i workers in the 
organization at the beginning of period n, then X,,, 7 > 0 isa Markov chain. To 
compute its stationary probability distribution, suppose that the initial state is 
chosen so that the number of workers of different types are independent Poisson 
random variables, with a; being the mean number of type i workers. That is, 
suppose that X9(1),..., Xo(r) are independent Poisson random variables with 
respective means aj,...,@,. Also, let Nj,j = 1,...,7, be the number of new 
type j workers hired during the initial period. Now, fix i, and for j = 1,...,7, 
let M;(/) be the number of the Xo(é) type i workers who become type j in 
the next period. Then because each of the Poisson number Xo(i) of type i 
workers will independently become type j with probability g;;,j = 1,...,1, it 
follows from the remarks following Example 3.23 that Mj(1),...,Mj(r) are 
independent Poisson random variables with M;(j) having mean ajqi,;. Because 
Xo0(1),..-,Xo(r) are, by assumption, independent, we can also conclude that 
the random variables M,(j),i,j = 1,...,r are all independent. Because the 
sum of independent Poisson random variables is also Poisson distributed, the 
preceding yields that the random variables 


Yr 
MGM=N+ > MG, jH,.-.57 
i=1 
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are independent Poisson random variables with means 


r 
EX] =A) + Do agi 
i=1 


Hence, if a1,..., a, satisfied 


r 
aj = Ay + D- cgiys pe Laci 
i=1 
then X; would have the same distribution as Xo. Consequently, if we let 
af,...,a° be such that 


t 
a? =A + Dalai, he ree a 
i=1 


then the stationary distribution of the Markov chain is the distribution that 
takes the number of workers in each type to be independent Poisson random 
variables with means a?,...,a?. That is, 


Yr 
Jim PXn = Ris---skd} = Pe Py /ki! 
i=1 
It can be shown that there will be such values a?,j = 1,...,7, provided 


that, with probability 1, each worker eventually leaves the organization. Also, 
because there is a unique stationary distribution, there can only be one such 
set of values. a 


For state j, define m; to be the expected number of transitions until a Markov 
chain, starting in state j, returns to that state. Since, on the average, the chain will 
spend 1 unit of time in state j for every mj units of time, it follows that 


In words, the proportion of time in state j equals the inverse of the mean time 
between visits to j. (The preceding is a special case of a general result, sometimes 
called the strong law for renewal processes, which will be presented in Chapter 7.) 


Example 4.26 (Mean Pattern Times in Markov Chain Generated Data) Consider 
an irreducible Markov chain {X,,” > 0} with transition probabilities P;; and 
stationary probabilities 2,7 > 0. Starting in state r, we are interested in deter- 
mining the expected number of transitions until the pattern 1, i2,..., ig appears. 
That is, with 


N(i1, i2,..-,i¢) = min{n > k: Xk =Uy--->Xn = tp} 
we are interested in 


EUN (41, 12, .--5 tp) |Xo = 7] 
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Note that even if i; = 1, the initial state Xo is not considered part of the pattern 
sequence. 

Let (i, 71) be the mean number of transitions for the chain to enter state #1, 
given that the initial state is i,i > 0. The quantities (i, 11) can be determined as 
the solution of the following set of equations, obtained by conditioning on the 
first transition out of state 7: 


uO) =14+ > PiguGi), 120 
TFA 


For the Markov chain {X,,2>0} associate a corresponding Markov chain, 
which we will refer to as the k-chain, whose state at any time is the sequence 
of the most recent k states of the original chain. (For instance, if k = 3 and 
X) = 4, X3 = 1, X4 = 1, then the state of the k-chain at time 4 is (4,1, 1).) Let 
1(j1,-+-5Jz) be the stationary probabilities for the k-chain. Because x(j1,..., jp) 
is the proportion of time that the state of the original Markov chain k units ago 
was /; and the following k—1 states, in sequence, were /2, ... , jz, we can conclude 
that 


W(f1,---5fk) = Tj, Pin in mm Pin-ssik 


Moreover, because the mean number of transitions between successive visits of 
the k-chain to the state i1, i2,..., i is equal to the inverse of the stationary prob- 
ability of that state, we have that 


E[number of transitions between visits to 71, i2,..., ig] 


= a (4.13) 
W(t1,---5%p) 


Let A(i1,..., im) be the additional number of transitions needed until the pat- 
tern appears, given that the first 7 transitions have taken the chain into states 
Cen eee one 

We will now consider whether the pattern has overlaps, where we say that the 
pattern 71,72,..., ig has an overlap of size j,j < k, if the sequence of its final j 
elements is the same as that of its first ; elements. That is, it has an overlap of 
size j if 


(ip_jtts-+ tk) = Gia--- of), pak 


Case 1 The pattern 11, i2,..., i, has no overlaps. 
Because there is no overlap, Equation (4.13) yields 
1 


HNGiiasocad Xe — 
UMP AE I tees a) 
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Because the time until the pattern occurs is equal to the time until the chain enters 
state i, plus the additional time, we may write 


E[N (41, 12, ---5%k)|X0 = te] = We, 11) + E[AM)] 
The preceding two equations imply 


1 
F[A(i4)] = a(i1,-.-5ig) — (tp, 11) 


Using that 
EIN (1, 12, ---5%%)|X0 = 7] = wr.) + E[AG)] 


gives the result 


rare . . 1 ae 
E(N (41, 12, .--5%2)|Xo = 17] = w(t, 4) + ——— — (ig, 1) 
W(t1,..-5%p) 


where 
WI, +++ 5%p) = Ti, Pin in ea Pig sig 


Case 2 Now suppose that the pattern has overlaps and let its largest overlap be 
of size s. In this case the number of transitions between successive visits of the 
k-chain to the state i, i2,...,%, is equal to the additional number of transitions 
of the original chain until the pattern appears given that it has already made s 
transitions with the results X, = i1,..., Xs = is. Therefore, from Equation (4.13) 


1 


FLA, teey is)] = GGiseinsie) 


But because 
N(41, 22, ---5%p) + N(a,. : . 5s) + Ai, oa -5s) 


we have 


1 


E(N (41, 12, .--5%2)|Xo = 7] = EIN, 22,...,45)|Xo = 1] + ———~ 
W(11,---5 0p) 


We can now repeat the same procedure on the pattern /1,..., is, continuing to do 
so until we reach one that has no overlap, and then apply the result from Case 1. 
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For instance, suppose the desired pattern is 1, 2,3, 1,2, 3,1,2. Then 


E(N(1, 2, 3, 1, 2, 3, 1,2)|Xo =r]= F(N(1, 2, 3, 1,2)|Xo =1) 
1 
Tr, 2s 3212.31) 


Because the largest overlap of the pattern (1,2,3,1,2) is of size 2, the same 
argument as in the preceding gives 
1 
E[N(1, 2, 3,1, 2)|Xo = 7] = E[N(,2)|Xo = a 
[ (1, 535 9 I 0 r| [ (, I 0 I+ 742,3,1,2) 
Because the pattern (1,2) has no overlap, we obtain from Case 1 that 


EIN, 2)|Xo = 7] = u(r, 1) + u(2, 1) 


i 
m(1,2) 


Putting it together yields 


EIN(1, 2, 3, 1, 2, 3, 1,2)|Xo =r]= L(Y, 1) an — u(2, 1) 
P19 


1 1 
+ i 
mP7yP2,3P31 m1 Pi yP33P34 


If the generated data is a sequence of independent and identically distributed 
random variables, with each value equal to j with probability P;, then the Markov 
chain has P;; = P;. In this case, 2; = P;. Also, because the time to go from state i to 
state j is a geometric random variable with parameter P;, we have p(i,/) = 1/P;. 
Thus, the expected number of data values that need be generated before the 
pattern 1, 2,3, 1,2,3,1,2 appears would be 


ree: Mise SN 25 a a 
Pio Pro: Pi PREPS Pa.’ (PIS Pe 
1 1 a 


= + + 
Piho.. (PRS Ps: Paes es 


The following result is quite useful. 


Proposition 4.3 Let {X,,,7 > 1} be an irreducible Markov chain with stationary 
probabilities 7,7 > 0, and let r be a bounded function on the state space. Then, 
with probability 1, 


ee) 


N 
lim Mra (On) — Soran; 


N 
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Proof. If we let aj(N) be the amount of time the Markov chain spends in state j 
during time periods 1,..., N, then 


N 


Yo r(Xn) = Yo ai(N)r(f) 


n=1 j=0 
Since aj(N)/N — 1; the result follows from the preceding upon dividing by N 
and then letting N > oo. a 


If we suppose that we earn a reward r(j) whenever the chain is in state /, then 
Proposition 4.3 states that our average reward per unit time is yr ay. 


Example 4.27 For the four state Bonus Malus automobile insurance system spec- 
ified in Example 4.7, find the average annual premium paid by a policyholder 
whose yearly number of claims is a Poisson random variable with mean 1/2. 


Solution: With a, = eH 1/2 0/2 we have 
a9 = 0.6065, a, =0.3033, az = 0.0758 


Therefore, the Markov chain of successive states has the following transition 
probability matrix: 


0.6065 0.3033 0.0758 0.0144 
0.6065 0.0000 0.3033 0.0902 
0.0000 0.6065 0.0000 0.3935 
0.0000 0.0000 0.6065 0.3935 


The stationary probabilities are given as the solution of 


mw, = 0.606521 + 0.606572, 

m2 = 0.303371 + 0.606573, 

m3 = 0.0758z1 + 0.303372 + 0.606574, 
mt+mt+734+m4=1 


Rewriting the first three of these equations gives 


_ 1— 0.6065 

ar eae i) Mas 

ae 2, — 0.303371 
7 = BOGS. = 3 


eg = 13— 0.075801 — 0.303312 
os 0.6065 
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or 


m2 = 0.648871, 
m3 = 0.569771, 
m4 = 0.490074 


Using that ey mt; = 1 gives the solution (rounded to four decimal places) 
mw, = 0.3692, m2 =0.2395, 23=0.2103, m4 = 0.1809 
Therefore, the average annual premium paid is 


20027 + 25072 + 40073 + 60074 = 326.375 a 
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4.5.1 The Gambler’s Ruin Problem 


Consider a gambler who at each play of the game has probability p of winning 
one unit and probability g = 1 — p of losing one unit. Assuming that successive 
plays of the game are independent, what is the probability that, starting with i 
units, the gambler’s fortune will reach N before reaching 0? 

If we let X,, denote the player’s fortune at time , then the process {X,,” = 
0,1,2,...} isa Markov chain with transition probabilities 


Poo = Pnn = 1, 
Pris =p=1-—Pij-1, 1=1, 2,...,.N—-1 


This Markov chain has three classes, namely, {0}, {1,2,..., N—1}, and {N}; the 
first and third class being recurrent and the second transient. Since each transient 
state is visited only finitely often, it follows that, after some finite amount of time, 
the gambler will either attain his goal of N or go broke. 

Let P;,i = 0,1,..., N, denote the probability that, starting with i, the gambler’s 
fortune will eventually reach N. By conditioning on the outcome of the initial 
play of the game we obtain 


P; = pPi41 + QPi-1, i=1,2,...,.N—1 
or equivalently, since p + g = 1, 


pP; + qPi = pPi+1 + qPi-1 
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or 


Piz1 — Pj = 5 Pi Pit) i=1,2,...,N-1 


Hence, since Py = 0, we obtain from the preceding line that 


q q 
Py — Py = —(P; — Po) = =P, 
Pp Pp 


2 
q q 
Ppa a Ope PS S| Pi 
3 2 pi? 1) (2) 1 


i-1 
q q 

Pe Pe Py Pe »= (2) Pis 

ee rc 2 ; 1 


: q\N-! 
Py — PnN-1 = (Z)ern- — Pn-2) = (2) Py 


Adding the first i— 1 of these equations yields 


nonmal(s)+(f) o-+() 


1— (q/p)' ve 

ese LE er a 
>| i-@ip ? “p™ 

iP, pk ey 


Now, using the fact that Py = 1, we obtain 


1— (q/p) 1 
>} taps PFS 
~~ )4 1 

N? ifp => 
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and hence 
1 — (q/p)' ; 1 
eae LD 
pa. t= apy 2 (4.14) 
i ffyee. 
N’ eS 


Note that, as N > ov, 


q\' 
1-(4). if p > 
P; > Pp 


1 


0, ifp < 


NIB NIB 


Thus, if p > 7 there is a positive probability that the gambler’s fortune will 
increase indefinitely; while if p < 7 the gambler will, with probability 1, go 
broke against an infinitely rich adversary. 


Example 4.28 Suppose Max and Patty decide to flip pennies; the one coming 
closest to the wall wins. Patty, being the better player, has a probability 0.6 of 
winning on each flip. (a) If Patty starts with five pennies and Max with ten, what 
is the probability that Patty will wipe Max out? (b) What if Patty starts with 10 
and Max with 20? 


Solution: (a) The desired probability is obtained from Equation (4.14) by 
letting i= 5, N = 15, and p = 0.6. Hence, the desired probability is 


— 
WIN 


1= (3) = 0.87 
fs: 15 . 


— 
WI 
a” 


(b) The desired probability is 


10 


_ 


—~ |G 
WIN JO[N 
Ss [| 


For an application of the gambler’s ruin problem to drug testing, suppose that 
two new drugs have been developed for treating a certain disease. Drug i has 
a cure rate Pj,i = 1,2, in the sense that each patient treated with drug i will 
be cured with probability P;. These cure rates, however, are not known, and 
suppose we are interested in a method for deciding whether P, > Pz or Pz > Py. 
To decide upon one of these alternatives, consider the following test: Pairs of 
patients are treated sequentially with one member of the pair receiving drug 1 
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and the other drug 2. The results for each pair are determined, and the testing 
stops when the cumulative number of cures using one of the drugs exceeds the 
cumulative number of cures when using the other by some fixed predetermined 
number. More formally, let 


1, if the patient in the jth pair to receive drug number 1 is cured 
0, otherwise 


Y= 1, if the patient in the jth pair to receive drug number 2 is cured 
1“~ 10, otherwise 


For a predetermined positive integer M the test stops after pair N where N is 
the first value of 7 such that either 


Xt +X —- Vi +--+ Y= M 
or 
Ay as FE Xe OY a ee Yn) SM 


In the former case we then assert that P; > P2, and in the latter that Pz > P. 

In order to help ascertain whether the preceding is a good test, one thing we 
would like to know is the probability of it leading to an incorrect decision. That is, 
for given Py and Py) where P, > P2, what is the probability that the test will 
incorrectly assert that Py > P,? To determine this probability, note that after 
each pair is checked the cumulative difference of cures using drug 1 versus drug 
2 will either go up by 1 with probability P;(1 — P2)—since this is the probability 
that drug 1 leads toa cure and drug 2 does not—or go down by 1 with probability 
(1 — P1)P2, or remain the same with probability P,P. + (1 — P)(1 — Pz). Hence, 
if we only consider those pairs in which the cumulative difference changes, then 
the difference will go up 1 with probability 


p = P{up 1|up 1 or down 1} 
= Py(1 — P2) 
Py(1 — P2) + (1 — Py)P2 


and down 1 with probability 


= Po(1 — Py) 
Py(1— P2) + A — Py)P2 


q=1-p 


Hence, the probability that the test will assert that P2 > P, is equal to the 
probability that a gambler who wins each (one unit) bet with probability p will 
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go down M before going up M. But Equation (4.14) with i = M, N = 2M, shows 
that this probability is given by 


1— (q/p)M 
P hat hy Pap Lo 
{test asserts that P2 > P;} 1— (q/p)-™ 
eee ieee 
~ 1+ (p/gQ™ 


Thus, for instance, if Pj = 0.6 and Pz = 0.4 then the probability of an incorrect 
decision is 0.017 when M = 5 and reduces to 0.0003 when M = 10. 


4.5.2 A Model for Algorithmic Efficiency 


The following optimization problem is called a linear program: 


minimize cx, 
subject to Ax = b, 
x>0 


where A is an m x n matrix of fixed constants; c = (cj,...,¢,) and b = 
(b1,...,bm) are vectors of fixed constants; and x = (x1,...,X,) is the m-vector of 
nonnegative values that is to be chosen to minimize cx = )~/_, cjx;. Supposing 
that 7 > m, it can be shown that the optimal x can always be chosen to have at 
least 1 — m components equal to 0—that is, it can always be taken to be one of 
the so-called extreme points of the feasibility region. 

The simplex algorithm solves this linear program by moving from an extreme 
point of the feasibility region to a better (in terms of the objective function cx) 
extreme point (via the pivot operation) until the optimal is reached. Because there 
can be as many as N = (”) such extreme points, it would seem that this method 
might take many iterations, but, surprisingly to some, this does not appear to be 
the case in practice. 

To obtain a feel for whether or not the preceding statement is surprising, let us 
consider a simple probabilistic (Markov chain) model as to how the algorithm 
moves along the extreme points. Specifically, we will suppose that if at any time 
the algorithm is at the jth best extreme point then after the next pivot the resulting 
extreme point is equally likely to be any of the j — 1 best. Under this assumption, 
we show that the time to get from the Nth best to the best extreme point has 
approximately, for large N, a normal distribution with mean and variance equal 
to the logarithm (base e) of N. 

Consider a Markov chain for which Py, = 1 and 
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and let T; denote the number of transitions needed to go from state / to state 1. 
A recursive formula for E[T;] can be obtained by conditioning on the initial 
transition: 


Starting with E[T;] = 0, we successively see that 
E[T2] = 1, 
E[T3] =1+ 3, 
E[Ta]=14+ 301 +143) =14+343 


and it is not difficult to guess and then prove inductively that 
i-1 
1 a eee 
j=l 


However, to obtain a more complete description of Ty, we will use the repre- 
sentation 


where 


pat if the process ever enters j 
1“ 10, otherwise 


The importance of the preceding representation stems from the following: 


Proposition 4.4 11,...,IN_1 are independent and 
PU ==1/7, 1<j<N-1 


Proof. Given Jj+1,...,IN, let 2 = min{i: i > j,1; = 1} denote the lowest num- 
bered state, greater than /, that is entered. Thus we know that the process enters 
state m and the next state entered is one of the states 1,2,...,7. Hence, as the 
next state from state 1 is equally likely to be any of the lower number states 
1,2,...,2— 1 we see that 


1/m-—1) 


PRE aint S— 
j Tj-4 N j/(—1) 


=1/ 
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Hence, P{I; = 1} = 1/j, and independence follows since the preceding conditional 
probability does not depend on Jj41,..., IN. |_| 


Corollary 4.5 


(i) ELT] = Sy" 1/. 
(ii) Var(Ty) = OS4' (1/0 - 1/). 
(iii) For N large, Tx has approximately a normal distribution with mean log N and 
variance log N. 


Proof. Parts (i) and (ii) follow from Proposition 4.4 and the representation Ty = 
ee ar ];. Part (iii) follows from the central limit theorem since 


Noe: © Es N-1 dy 
—< vyi<i+ | — 


or 
N-1 
logN < oe 1/j < 1+ log(N — 1) 
1 
and so 
N-1 
log N + a 1/j a 
j=l 


Returning to the simplex algorithm, if we assume that n, m, and n — mare all 
large, we have by Stirling’s approximation that 


n nntl/2 
— (”) ~ i = m)t—m+1/2yym+1/2, /2.1 
and so, letting c = n/m, 


log N ~ (mc + 5) log(mc) — (m(c —1)+ 5) log(m(c — 1)) 
(m+ 5) logm 5 log(27) 


or 


c 
c-1 


log N ~ m clog + log(c — | 


Now, as limy-+o0 x log[x/(x — 1)] = 1, it follows that, when c is large, 


log N ~ m[1 + log(c¢ — 1)] 
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Thus, for instance, if 2 = 8000, m = 1000, then the number of necessary tran- 
sitions is approximately normally distributed with mean and variance equal to 
1000(1 + log 7) ~ 3000. Hence, the number of necessary transitions would be 
roughly between 


3000 +2V3000 orroughly 3000+ 110 


95 percent of the time. 


4.5.3 Using a Random Walk to Analyze a Probabilistic Algorithm for 
the Satisfiability Problem 


Consider a Markov chain with states 0, 1,..., having 
Poi = 1, Piisd =D, Pii-1 =q=1-p, 1l<i<n 


and suppose that we are interested in studying the time that it takes for the 
chain to go from state 0 to state m. One approach to obtaining the mean time to 
reach state 7 would be to let m; denote the mean time to go from state i to state 
n,i=0,...,2—1. If we then condition on the initial transition, we obtain the 
following set of equations: 


m=1+m, 
mj; = E[time to reach n|next state is i + 1]p 
+ E[time to reach n|next state is i — 1]q 
= (14+ mizi)p + A + mj-1)9 
=1+pmjz1.+qmji1, t=1,...,2—-1 
Whereas the preceding equations can be solved for m;,i = 0,...,2—1, we do not 
pursue their solution; we instead make use of the special structure of the Markov 
chain to obtain a simpler set of equations. To start, let N; denote the number 
of additional transitions that it takes the chain when it first enters state 7 until 
it enters state i + 1. By the Markovian property, it follows that these random 


variables Nj, i = 0,...,2— 1 are independent. Also, we can express No,n, the 
number of transitions that it takes the chain to go from state 0 to state , as 


n—1 
Non = > Ni (4.15) 
i=0 
Letting 4; = E[Nj] we obtain, upon conditioning on the next transition after the 


chain enters state i, that fori =1,...,n—1 


“4; = 1 + Efnumber of additional transitions to reach i + 1|chain to i — 1]q 
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Now, if the chain next enters state i— 1, then in order for it to reach i + 1 it must 
first return to state i and must then go from state i to state i + 1. Hence, we have 
from the preceding that 


ui = 1+ EIN*, + Né]q 


where N*_, and N* are, respectively, the additional number of transitions to 
return to state i from i — 1 and the number to then go from i to i + 1. Now, it 
follows from the Markovian property that these random variables have, respec- 
tively, the same distributions as N;_1 and Nj. In addition, they are independent 
(although we will only use this when we compute the variance of No,7). Hence, we 
see that 


Mj = 1+ q(ui-1 + bi) 
or 


1 ‘ 
Kise tot i=1,...,.n-1 


Starting with wo = 1, and letting a = g/p, we obtain from the preceding recursion 
that 


H1=1/pt+a, 

2 =1/p + a(l/p +a) =1/pt+a/p+a’, 

13 = 1/p + a(1/p + a/p + a”) 
=1/pt+a/pt+a°/p+or? 


In general, we see that 
=-Yiad +a’, i=1,...,n-1 (4.16) 


Using Equation (4.15), we now get 


jee aa 
E[Non»l=1+— Sa oa 
Pat =o i=1 


When p = 7 and so a = 1, we see from the preceding that 


E(No»7]=14+ (#—-Dntn-1=r° 
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When p 4 7 we obtain 


1 ae a— a” 
ELNo.n] = 1 1 : 
[Noy] sar eeryp a eae 
1l+a (a — a”) a—a” 
=1 1 
eri | l-a 
5 2et1_ (n+ 1a? +n-1 
7 Gea 


where the second equality used the fact that p = 1/(1 + @). Therefore, we see 
that when a > 1, or equivalently when p < 5 the expected number of transi- 
tions to reach v is an exponentially increasing function of . On the other hand, 
when p = 5, EINol = n*, and when p > 5, EINo,l is, for large n, essentially 
linear in 7. 

Let us now compute Var(No,7). To do so, we will again make use of the repre- 
sentation given by Equation (4.15). Letting v; = Var(N;), we start by determining 
the v; recursively by using the conditional variance formula. Let S$; = 1 if the first 
transition out of state 7 is into state 7 + 1, and let §; = —1 if the transition is into 
statei—1,i=1,...,2— 1. Then, 


given that S;=1: N;=1 
given that $;=—1: N;=1+N;_,+N? 
Hence, 
E(N,|S; = 1] = 1, 
E(INi|S; = —1) = 1+ win + 
implying that 
Var(E[N;|S;]) = Var(E[N;|S;] — 1) 
= (i-1 + wi)? — (ui-a + wi"? 
= gp(ui-1 + Mi)? 


Also, since N*_, and N*, the numbers of transitions to return from state i — 1 
to i and to then go from state i to state i + 1 are, by the Markovian property, 
independent random variables having the same distributions as N;_; and Nj, 
respectively, we see that 


Var(N;|S; = 1) = 0, 
Var(N,|$; = —1) = vj-1 + 0; 
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Hence, 
E[Var(Nj|Sj)] = qvi-1 + vi) 
From the conditional variance formula, we thus obtain 
vi = pq(ui-1 + wa? + qvi-1 + 44) 
or, equivalently, 
vj =qQ(Hi-1 + mi)? tavjy, i=1,...,n-1 
Starting with vo = 0, we obtain from the preceding recursion that 


v1 = q(uo + 141)", 
v2 = qtr + 2)” + aq(uo + 141)", 
v3 = q(u2 + M3)? + aq(uer + 2)” +07 q(uo + M1)” 


In general, we have for i > 0, 
v=qy oe Muja + mi)” (4.17) 
j=1 
Therefore, we see that 


n—-1 1 


Var(Non) = Ee = dy Soa Fwy 1+ uj) 


i=1 j=1 
where ju; is given by Equation (4.16). 

We see from Equations (4.16) and (4.17) that when p > 7 and sow < 1, 
that yz; and v;, the mean and variance of the number of transitions to go from 
state i to i + 1, do not increase too rapidly in i. For instance, when p = 5 it 
follows from Equations (4.16) and (4.17) that 

j= 2i4+1 


and 


=5 oar =8y7? 
j=l j=l 
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Hence, since No,, is the sum of independent random variables, which are of 
roughly similar magnitudes when p > 5, it follows in this case from the cen- 
tral limit theorem that No,» is, for large m, approximately normally distributed. 
In particular, when p = 5>No,n is approximately normal with mean n* and 
variance 


n—-1 i 
Var(Non) = 8) Doi” 

i=1 j=1 

n—1n-1 


iy 


j=l i=j 
n—-1 
=8)n-pj 
j=l 


n—1 
~ 8 / (n — x)x* dx 
1 


~ 2n4 
Example 4.29 (The Satisfiability Problem) A Boolean variable x is one that takes 
on either of two values: TRUE or FALSE. If x;,i > 1 are Boolean variables, then 
a Boolean clause of the form 


x1 +X2 + x3 


is TRUE if x1 is TRUE, or if x2 is FALSE, or if x3 is TRUE. That is, the symbol 
“1” means “or” and x is TRUE if x is FALSE and vice versa. A Boolean formula 
is a combination of clauses such as 


(x4 + X2) * (xy + 3) * (x2 + X3) * (X1 + X2) * (X1 + X2) 


In the preceding, the terms between the parentheses represent clauses, and the 
formula is TRUE if all the clauses are TRUE, and is FALSE otherwise. For a 
given Boolean formula, the satisfiability problem is either to determine values 
for the variables that result in the formula being TRUE, or to determine that the 
formula is never true. For instance, one set of values that makes the preceding 
formula TRUE is to set x; = TRUE, x2 = FALSE, and x3 = FALSE. 

Consider a formula of the 7 Boolean variables x1,...,x, and suppose that 
each clause in this formula refers to exactly two variables. We will now present 
a probabilistic algorithm that will either find values that satisfy the formula or 
determine to a high probability that it is not possible to satisfy it. To begin, start 


242 Markov Chains 


with an arbitrary setting of values. Then, at each stage choose a clause whose 
value is FALSE, and randomly choose one of the Boolean variables in that clause 
and change its value. That is, if the variable has value TRUE then change its 
value to FALSE, and vice versa. If this new setting makes the formula TRUE 
then stop, otherwise continue in the same fashion. If you have not stopped after 
n*(1 + 4,2) repetitions, then declare that the formula cannot be satisfied. We 
will now argue that if there is a satisfiable assignment then this algorithm will 
find such an assignment with a probability very close to 1. 

Let us start by assuming that there is a satisfiable assignment of truth values 
and let be such an assignment. At each stage of the algorithm there is a certain 
assignment of values. Let Y; denote the number of the variables whose values at 
the jth stage of the algorithm agree with their values in .%. For instance, suppose 
that n = 3 and & consists of the settings x1 = x2 = x3 = TRUE. If the assign- 
ment of values at the jth step of the algorithm is x1 =TRUE, x2 =x3 =FALSE, 
then Y; = 1. Now, at each stage, the algorithm considers a clause that is not sat- 
isfied, thus implying that at least one of the values of the two variables in this 
clause does not agree with its value in «/. Asa result, when we randomly choose 
one of the variables in this clause then there is a probability of at least 5 that 
Yj41 = Y; + 1 and at most } that Yj; = Y; — 1. That is, independent of what 
has previously transpired in the algorithm, at each stage the number of settings 
in agreement with those in . will either increase or decrease by 1 and the prob- 
ability of an increase is at least 5 (it is 1 if both variables have values different 
from their values in &). Thus, even though the process Y;,j > 0 is not itself a 
Markov chain (why not?) it is intuitively clear that both the expectation and the 
variance of the number of stages of the algorithm needed to obtain the values of 
of will be less than or equal to the expectation and variance of the number of 
transitions to go from state 0 to state 7 in the Markov chain of Section 4.5.2. 
Hence, if the algorithm has not yet terminated because it found a set of satis- 
fiable values different from that of .%, it will do so within an expected time of 
at most 7” and with a standard deviation of at most n*/2. In addition, since 
the time for the Markov chain to go from 0 to 7 is approximately normal when 
n is large we can be quite certain that a satisfiable assignment will be reached 
by n? + 4(n?/2) stages, and thus if one has not been found by this number 
of stages of the algorithm we can be quite certain that there is no satisfiable 
assignment. 

Our analysis also makes it clear why we assumed that there are only two 
variables in each clause. For if there were k,k > 2, variables in a clause then as 
any clause that is not presently satisfied may have only one incorrect setting, a 
randomly chosen variable whose value is changed might only increase the number 
of values in agreement with © with probability 1/k and so we could only conclude 
from our prior Markov chain results that the mean time to obtain the values in 
&f is an exponential function of 1, which is not an efficient algorithm when 1 is 
large. a 
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4.6 Mean Time Spent in Transient States 


Consider now a finite state Markov chain and suppose that the states are num- 
bered so that T = {1,2,...,£} denotes the set of transient states. Let 


Puy Piz --s Pu 
Pr =|) < : Ss 
Pr Py Sat Pr 


and note that since Pr specifies only the transition probabilities from transient 
states into transient states, some of its row sums are less than 1 (otherwise, 
T would be a closed class of states). 

For transient states i and j, let sj; denote the expected number of time periods 
that the Markov chain is in state j, given that it starts in state i. Let 5; = 1 when 
i = j and let it be 0 otherwise. Condition on the initial transition to obtain 


$i = 815 + D> Pinsej 
k 


t 
= big + >> Piesej (4.18) 
k=1 


where the final equality follows since it is impossible to go from a recurrent to a 


transient state, implying that s,; = 0 when k is a recurrent state. 
Let S denote the matrix of values sj, i,j = 1,...,¢. That is, 


S11. S12 +++) St 


St1 St2 ++ St 
In matrix notation, Equation (4.18) can be written as 
S=I+ PrS 


where I is the identity matrix of size t. Because the preceding equation is equiva- 
lent to 


d—Pp)S=I 
we obtain, upon multiplying both sides by (I— P7)~1, 


S=(d-Pr)7! 
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That is, the quantities sj,i € T,j € T, can be obtained by inverting the matrix 
I — Pr. (The existence of the inverse is easily established.) 


Example 4.30 Consider the gambler’s ruin problem with p = 0.4 and N = 7. 
Starting with 3 units, determine 


(a) the expected amount of time the gambler has 5 units, 
(b) the expected amount of time the gambler has 2 units. 


Solution: The matrix Py, which specifies Pj,i,j € {1,2,3,4,5,6}, is as 
follows: 


1 2 3 4 5 6 

1|0 0.4 0 0 0 

2/0.6 0 0.4 0 0 0 

3 | 0 0.6 0 0.4 0 0 

Pr= 410 0 0.6 0 0.4 0 
5 | 0 0 0 0.6 0 0.4 

6 | 0 0 0 0 0.6 0 


Inverting I — Pr gives 


1.6149 1.0248 0.6314 0.3691 0.1943 0.0777 
1.5372 2.5619 1.5784 0.9228 0.4857 0.1943 
1.4206 2.3677 2.9990 1.7533 0.9228 0.3691 


a, a lhe ~ 
SS AES 1.2458 2.0763 2.6299 2.9990 1.5784 0.6314 
0.9835 1.6391 2.0763 2.3677 2.5619 1.0248 
0.5901 0.9835 1.2458 1.4206 1.5372 1.6149 
Hence, 
$3,5 = 0.9228, $32 = 2.3677 | 


For i € T,j € T, the quantity fj, equal to the probability that the Markov 
chain ever makes a transition into state j given that it starts in state i, is eas- 
ily determined from Py. To determine the relationship, let us start by deriving 
an expression for s; by conditioning on whether state j is ever entered. This 
yields 


sj = E[time in j|start in i, ever transit to j] fi 
+ E[time in j|start in i, never transit to /](1 — fi) 
= (i + s)fij + Sig — fis) 
= bi + fiSi 
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since sj is the expected number of additional time periods spent in state j 
given that it is eventually entered from state i. Solving the preceding equation 
yields 


fi = —— 
Sij 


Example 4.31 In Example 4.30, what is the probability that the gambler ever 
has a fortune of 1? 


Solution: Since s3,1 = 1.4206 and s;,1 = 1.6149, then 


f= 21 = 0.8797 
$11 


As a check, note that /3,1 is just the probability that a gambler starting with 
3 reaches 1 before 7. That is, it is the probability that the gambler’s fortune will 
go down 2 before going up 4; which is the probability that a gambler starting 
with 2 will go broke before reaching 6. Therefore, 


1 — (0.6/0.4)? 
= ———_____~_ = 0.8797 
a (0.6/0.4)6 
which checks with our earlier answer. | 


Suppose we are interested in the expected time until the Markov chain enters 
some sets of states A, which need not be the set of recurrent states. We can reduce 
this back to the previous situation by making all states in A absorbing states. That 
is, reset the transition probabilities of states in A to satisfy 


This transforms the states of A into recurrent states, and transforms any state 
outside of A from which an eventual transition into A is possible into a transient 
state. Thus, our previous approach can be used. 


4.7 Branching Processes 


In this section we consider a class of Markov chains, known as branching pro- 
cesses, which have a wide variety of applications in the biological, sociological, 
and engineering sciences. 

Consider a population consisting of individuals able to produce offspring of 
the same kind. Suppose that each individual will, by the end of its lifetime, have 
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produced j new offspring with probability P;, 7 > 0, independently of the numbers 
produced by other individuals. We suppose that P; < 1 for allj > 0. The number 
of individuals initially present, denoted by Xo, is called the size of the zeroth 
generation. All offspring of the zeroth generation constitute the first generation 
and their number is denoted by Xj. In general, let X,, denote the size of the mth 
generation. It follows that {X,,, = 0,1,...} isa Markov chain having as its state 
space the set of nonnegative integers. 

Note that state 0 is a recurrent state, since clearly Pog = 1. Also, if Po > 
0, all other states are transient. This follows since Pg = Pes which implies that 
starting with i individuals there is a positive probability of at least P/, that no 
later generation will ever consist of i individuals. Moreover, since any finite set 
of transient states {1,2,...,} will be visited only finitely often, this leads to the 
important conclusion that, if Po > 0, then the population will either die out or 
its size will converge to infinity. 

Let 


denote the mean number of offspring of a single individual, and let 
[o,@) 
o =) (j- 4)’ Pj 
j=0 


be the variance of the number of offspring produced by a single individual. 
Let us suppose that Xo = 1, that is, initially there is a single individual present. 
We calculate E[X,,] and Var(X,,) by first noting that we may write 


Xn-1 
Xn = > Zi 
i=1 


where Z; represents the number of offspring of the ith individual of the (7 — 1)st 
generation. By conditioning on X,_1, we obtain 


Xn-1 


=E|/E Sil Xv 
i=1 


= E[X,-11] 
= PE[Xy-1] 
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where we have used the fact that E[Z;] = w. Since E[Xo] = 1, the preceding 
yields 


Similarly, Var(X,,) may be obtained by using the conditional variance formula 
Var(Xn) = E[Var(Xn|Xn—-1)] + Var(E[Xn|Xn-1]) 


Now, given Xy—1, Xn is just the sum of X,,_1 independent random variables each 
having the distribution {P;,j > 0}. Hence, 


E[Xn|Xn—1] = Xn-1ey Var(XnlXn—1) = Xn-107 
The conditional variance formula now yields 
Vary) = El Xa 10° + Var(X;,_11) 
= ot py"! + p?Var(Xn—-1) 
= o2 ee 4 (orn fe Dhl Van XS 2)) 
= Pe "4 wu) + wAVar(Xn-2) 
Sau + 2”) sea. 407m 2,,n—-3 + p?Var(Xy_ 3)) 
=o (u™t + we" + pt!) + poVar(Xn_3) 


= aul ae pe at Balok jee) 4 pr" Var(Xo) 
=o (u" +4 uu” Bahn a per?) 

Therefore, 

oe (=), if #1 


no~, ifu=1 


Var(Xn) = (4.19) 


Let x denote the probability that the population will eventually die out (under 
the assumption that Xo = 1). More formally, 


To = im P{Xn = 0|Xo = 1} 
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The problem of determining the value of zo was first raised in connection with 
the extinction of family surnames by Galton in 1889. 
We first note that mp = 1 if w < 1. This follows since 


pw" = E[Xn] = D0 jP{Xn = i} 
j=l 


[o,e) 

> 01: P(X, =))} 
j=l 

= P{Xn > 1} 


Since 4” — 0 when uw < 1, it follows that P{X, > 1} — 0, and hence 
P{X, =0} > 1. 

In fact, it can be shown that zg = 1 even when pw = 1. When p > 1, it turns 
out that 7 < 1, and an equation determining zo may be derived by conditioning 
on the number of offspring of the initial individual, as follows: 


m0 = P{population dies out} 
(o,@) 
= BS P{population dies out|X1 = j}P; 

j=0 
Now, given that X; = j, the population will eventually die out if and only if each 
of the j families started by the members of the first generation eventually dies out. 
Since each family is assumed to act independently, and since the probability that 
any particular family dies out is just 7, this yields 


P{population dies out|X1 = j} = mh 


and thus zo satisfies 
mo = > ahP; (4.20) 


In fact when > 1, it can be shown that zp is the smallest positive number 
satisfying Equation (4.20). 


Example 4.32 If Po = 5, Pi = i, Pz = i, then determine zo. 


Solution: Since uw = 3 <1, it follows that zo = 1. | 


Example 4.33 If Po = a Pi = a Py = 53 then determine zo. 


Solution: sz satisfies 


as eee 1_2 
TO = 4 + FO + 7TH 
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or 
2x6 — 3m) +1=0 


The smallest positive solution of this quadratic equation is 79 = 5. a 


Example 4.34 In Examples 4.32 and 4.33, what is the probability that the pop- 
ulation will die out if it initially consists of 1 individuals? 


Solution: Since the population will die out if and only if the families of each 
of the members of the initial generation die out, the desired probability is 776. 
For Example 4.32 this yields 74 = 1, and for Example 4.33, 79 = (5)”. a 


4.8 Time Reversible Markov Chains 


Consider a stationary ergodic Markov chain (that is, an ergodic Markov chain 
that has been in operation for a long time) having transition probabilities P;; and 
stationary probabilities z;, and suppose that starting at some time we trace the 
sequence of states going backward in time. That is, starting at time , consider 
the sequence of states X,,, Xy_1, Xn_2,-... It turns out that this sequence of states 
is itself a Markov chain with transition probabilities Qj defined by 


Oj = P{Xm = j|Xms1 = 
= PiXm =4,Xm41 = 1} 


PXmei =i} 
- PiXm = DBP Xin a i|Xin _ } 
P(Xmei =H 
aah 
Tj 


To prove that the reversed process is indeed a Markov chain, we must verify that 
P{Xm = j|Xm41 = 1, Xm42, Xm43,---} = P{Xm = j|Xms1 = 1} 


To see that this is so, suppose that the present time is m+ 1. Now, since 
Xo, X1,.X2,... is a Markov chain, it follows that the conditional distribution 
of the future Xj42, Xm+3,--. given the present state X,,41 is independent of 
the past state X,,. However, independence is a symmetric relationship (that is, 
if A is independent of B, then B is independent of A), and so this means that given 
Xm+1, Xm is independent of Xjn1+2, Xm+43,---. But this is exactly what we had to 
verify. 
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Thus, the reversed process is also a Markov chain with transition probabilities 
given by 


Oj = 


Tj 


If Oj = Pj for all i,j, then the Markov chain is said to be time reversible. The 
condition for time reversibility, namely, Oj; = Pj, can also be expressed as 


miP i = mj P ji for all 1, J (4.21) 


The condition in Equation (4.21) can be stated that, for all states i and /, the rate 
at which the process goes from i to j (namely, 7;P;j) is equal to the rate at which it 
goes from j to i (namely, 7;P;;). It is worth noting that this is an obvious necessary 
condition for time reversibility since a transition from i to j going backward in 
time is equivalent to a transition from j to i going forward in time; that is, if 
Xm =iand X,,_1 =j, then a transition from i to j is observed if we are looking 
backward, and one from j to i if we are looking forward in time. Thus, the rate 
at which the forward process makes a transition from j to i is always equal to the 
rate at which the reverse process makes a transition from i to /; if time reversible, 
this must equal the rate at which the forward process makes a transition 
from i to j. 

If we can find nonnegative numbers, summing to one, that satisfy Equa- 
tion (4.21), then it follows that the Markov chain is time reversible and the 
numbers represent the limiting probabilities. This is so since if 


ere ee for allay > 31 (4.22) 


L 


then summing over i yields 
beep as ae yx =1 
i i i 


and, because the limiting probabilities 7; are the unique solution of the preceding, 
it follows that x; = z; for all 7. 


Example 4.35 Consider a random walk with states 0,1,...,M and transition 
probabilities 
Pig =O; =1—- Pii-1, pon 1, 
Poi = a0 = 1—- Poo, 


Pum =&m = 1—Pyym-1 
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Without the need for any computations, it is possible to argue that this Markov 
chain, which can only make transitions from a state to one of its two nearest 
neighbors, is time reversible. This follows by noting that the number of transitions 
from i to i + 1 must at all times be within 1 of the number from i + 1 to i. 
This is so because between any two transitions from i to i + 1 there must be one 
from i + 1 to 7 (and conversely) since the only way to reenter i from a higher 
state is via state i + 1. Hence, it follows that the rate of transitions fromi toi + 1 
equals the rate from i + 1 to i, and so the process is time reversible. 

We can easily obtain the limiting probabilities by equating for each state i = 
0,1,...,M—1 the rate at which the process goes from i to i + 1 with the rate at 
which it goes from i + 1 to i. This yields 


(1 — a), 


TAO 


ma, = 12(1 — a2), 


jj = Wj41(1 — aj41), i=0,1,...,M—1 


Solving in terms of zo yields 


ao 
m= 05 
1l-a, 
ay 01.49 
1 = m= 0 
1—az (1 — a2)(1 — a1) 


and, in general, 


Qaj-1°°::ao9 


d= sim” 


i= 


P21, Fo, M 


Since Mx; = 1, we obtain 


M 
aj—-1°°: a0 
a) 1+ = 1 
La) Oa 
or 
M 


mo = 2 er eecear 


aj—1 “QQ 


(4.23) 
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and 
Qj-1°°:a0 . 
= =1,...,M 4.24 
Tj Gea ota 1 oy oy ( ) 
For instance, if a; = a, then 
M gh = 
=|1 
aK) + a (; = -) 
j=l 
ee on: 
hi pM+1 
and, in general, 
Ba-Bp) 
i = 7 BMT? i=0,1,...,M 
where 
p=— = 
~ 1l-a 


Another special case of Example 4.35 is the following urn model, proposed 
by the physicists P. and T. Ehrenfest to describe the movements of molecules. 
Suppose that M molecules are distributed among two urns; and at each time point 
one of the molecules is chosen at random, removed from its urn, and placed in 
the other one. The number of molecules in urn I is a special case of the Markov 
chain of Example 4.35 having 


M-i 
M” 


a= 


PaO. SM 


Hence, using Equations (4.23) and (4.24) the limiting probabilities in this case are 


-1 


M i 
(M—7+1)---(M-1)M 
= ]1+ 7 
. 2 j@-)--1 


-1 


i 

aE 
aa 
= 
Nn 


ll 
——N 
Rie 
ae 
= 
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where we have used the identity 


MY i" 
=|. =~} , 7=0,1,...,M 
1 2 


Because the preceding are just the binomial probabilities, it follows that in the 
long run, the positions of each of the M balls are independent and each one is 
equally likely to be in either urn. This, however, is quite intuitive, for if we focus 
on any one ball, it becomes quite clear that its position will be independent of 
the positions of the other balls (since no matter where the other M — 1 balls are, 
the ball under consideration at each stage will be moved with probability 1/M) 
and by symmetry, it is equally likely to be in either urn. 


Example 4.36 Consider an arbitrary connected graph (see Section 3.6 for 
definitions) having a number wj associated with arc (i,j) for each arc. One 
instance of such a graph is given by Figure 4.1. Now consider a particle moving 
from node to node in this manner: If at any time the particle resides at node i, 
then it will next move to node j with probability P;; where 


pee 
Y a 
Lj Mii 


and where wy is 0 if (i 


P 


is not an arc. For instance, for the graph of Figure 4.1, 


sj) 
Py. = 3/3 +142) =35. 


Figure 4.1. A connected graph with arc weights. 
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The time reversibility equations 
miP i = mj P ji 


reduce to 


or, equivalently, since wj = wjj 


Ti 1; 
Mii DL Mi 
which is equivalent to 
Tj 
=<¢ 
i Mii 


or 
n= 6) wi 
j 


or, since 1 = )0; 77; 


i Mii 
= 


Dipl Oa 


Because the zs given by this equation satisfy the time reversibility equations, it 
follows that the process is time reversible with these limiting probabilities. 
For the graph of Figure 4.1 we have that 


6 3 6 5 


TW1 = 375 FI= 375 W= 37, W=3 WSs a 


ios) 
eore 
wis 


If we try to solve Equation (4.22) for an arbitrary Markov chain with states 
0,1,...,M, it will usually turn out that no solution exists. For example, from 
Equation (4.22), 


x;Pij = x;Piis 
XpP pj = x; Pip 


implying (if PjPiz > 0) that 


xi PrP aj 


mp Pah 
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which in general need not equal P,;/P;,. Thus, we see that a necessary condition 
for time reversibility is that 


Pip PRP ii = PP ipPpi for all 1, J, k (4.25) 


which is equivalent to the statement that, starting in state i, the path i > k > 
j — ihas the same probability as the reversed path i > j > k > i. To understand 
the necessity of this, note that time reversibility implies that the rate at which a 
sequence of transitions from i to k to j to i occurs must equal the rate of ones 
from i to j to k to i (why?), and so we must have 
WP ig Pei Pi = WiP iP iRP Ri 
implying Equation (4.25) when z; > 0. 
In fact, we can show the following. 


Theorem 4.2 An ergodic Markov chain for which P;; = 0 whenever P;; = 0 is 
time reversible if and only if starting in state i, any path back to i has the same 
probability as the reversed path. That is, if 

Pi, Piz in + P. P .-P 


i = Pig Piet Pag (4.26) 


for all states i,i1,..., tp. 


Proof. We have already proven necessity. To prove sufficiency, fix states i and j 
and rewrite (4.26) as 
Pig Pinin + Pigg Pit = PHP iin Pin, 


11,12 ° th” 


Summing the preceding over all states 71,..., ig yields 
k+1p. _ p..pk+1 
Pe Pg Lyk y 
Letting k > oo yields 
mj P ji = Pir; 


which proves the theorem. a 


Example 4.37 Suppose we are given a set of m elements, numbered 1 through n, 
which are to be arranged in some ordered list. At each unit of time a request is 
made to retrieve one of these elements, element 7 being requested (independently 
of the past) with probability P;. After being requested, the element then is put 
back but not necessarily in the same position. In fact, let us suppose that the 
element requested is moved one closer to the front of the list; for instance, if the 
present list ordering is 1, 3, 4, 2, 5 and element 2 is requested, then the new 
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ordering becomes 1, 3, 2, 4, 5. We are interested in the long-run average position 
of the element requested. 

For any given probability vector P = (P1,...,P,), the preceding can be mod- 
eled as a Markov chain with ! states, with the state at any time being the list 
order at that time. We shall show that this Markov chain is time reversible and 
then use this to show that the average position of the element requested when 
this one-closer rule is in effect is less than when the rule of always moving the 
requested element to the front of the line is used. The time reversibility of the 
resulting Markov chain when the one-closer reordering rule is in effect easily fol- 
lows from Theorem 4.2. For instance, suppose 1 = 3 and consider the following 
path from state (1, 2, 3) to itself: 


(1, 2,3) > (2,1,3) > 2,3, 1) > (6,2, 1) 
> G,1,2) > 3,2) > 1,2, 3) 


The product of the transition probabilities in the forward direction is 
2 p2.p2 
P)P3P3P1PPy = P2P3P? 
whereas in the reverse direction, it is 
2 p2.p2 
P3P3P)P>P;P; = P2P3P2 
Because the general result follows in much the same manner, the Markov chain is 
indeed time reversible. (For a formal argument note that if f; denotes the number 
of times element i moves forward in the path, then as the path goes from a fixed 
state back to itself, it follows that element i will also move backward f; times. 
Therefore, since the backward moves of element i are precisely the times that it 


moves forward in the reverse path, it follows that the product of the transition 
probabilities for both the path and its reversal will equal 


re 
1 
i 


where 7; is equal to the number of times that element / is in the first position and 
the path (or the reverse path) does not change states.) 

For any permutation (1, i2,...,i7 of 1,2,...,, let m(i1,i2,...,in) denote the 
limiting probability under the one-closer rule. By time reversibility we have 


Pi aH, diss shi, Ui415 see stn) = Pin, eng F415 bj, Snir stn) (4.27) 


for all permutations. 
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Now the average position of the element requested can be expressed (as in 
Section 3.6.1) as 


Average position = Sy P;E[Position of element ] 


1 


= ys P;|1+ se P{element j precedes element i} 
i iAl 
84 ak > Y” PPE; precedes e;} 
i j#i 
=1+ 5 [PiP{e; precedes e;} + PjP{e; precedes ¢;}] 
i<j 
=1+ 5 [PiP{e; precedes e;} + Pj(1 — P{e; precedes e;})] 


i<j 


are S- Se — P)P{e; precedes e;} + Se Pi 


i<j i<j 


Hence, to minimize the average position of the element requested, we would 
want to make P{e; precedes e;} as large as possible when P; > P; and as 
small as possible when P; > P;. Under the front-of-the-line rule we showed in 
Section 3.6.1, 


P{e; precedes e;} = PoP, 
j i 


(since under the front-of-the-line rule element j will precede element i if and only 
if the last request for either i or j was for /). 

Therefore, to show that the one-closer rule is better than the front-of-the-line 
rule, it suffices to show that under the one-closer rule 


P; 
P{e; precedes e;} > P, . P, when P; > P; 


Now consider any state where element i precedes element j, say, 
(...5%5215 ---5%k5Jy---). By successive transpositions using Equation (4.27), 
we have 


P; k4+1 
Ml wsisis-sesiisis-. = ($4) Hass fellgaces lps tse ss) (4.28) 
j 
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For instance, 


P PoP 
#1,2,3).= pm, 4 Oy = Dp,7 1,2) 


P> Py P Py \7 
epee Phot (=) (3,2, 1) 
3 


ee te a3) 
Now when P; > P;, Equation (4.28) implies that 


os a2 P; ot eed 
TE Salts Missoni aiheie): wg Ms za Iy Tigao ako ys) 
j 


Letting a(i,7) = P{e; precedes e;}, we see by summing over all states for which i 
precedes j and by using the preceding that 


(i) P; (j,i) 
< —a, 
a(t, f P,” jst 


which, since a(i,/) = 1 — a, 2), yields 


oh j 
a(j,i) > 
G0) P, +P; 
Hence, the average position of the element requested is indeed smaller under the 
one-closer rule than under the front-of-the-line rule. a 


The concept of the reversed chain is useful even when the process is not time 
reversible. To illustrate this, we start with the following proposition whose proof 
is left as an exercise. 


Proposition 4.6 Consider an irreducible Markov chain with transition proba- 
bilities P;. If we can find positive numbers z;,i > 0, summing to one, and a 
transition probability matrix Q = [Q;] such that 


miP i = 7 Oj (4.29) 


then the Qj are the transition probabilities of the reversed chain and the 7; are 
the stationary probabilities both for the original and reversed chain. 

The importance of the preceding proposition is that, by thinking backward, 
we can sometimes guess at the nature of the reversed chain and then use the set 
of Equations (4.29) to obtain both the stationary probabilities and the Q;.. 


Example 4.38 A single bulb is necessary to light a given room. When the bulb 
in use fails, it is replaced by a new one at the beginning of the next day. Let X,, 
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equal i if the bulb in use at the beginning of day 7 is in its ith day of use (that is, if 
its present age is i). For instance, if a bulb fails on day m — 1, then a new bulb will 
be put in use at the beginning of day m and so X,, = 1. If we suppose that each 
bulb, independently, fails on its ith day of use with probability p;,i > 1, then it 
is easy to see that {X,,, > 1} is a Markov chain whose transition probabilities 
are as follows: 


P;,, = P{bulb, on its ith day of use, fails} 
= P{life of bulb = i|life of bulb > 7} 
_ PL =i} 


P{L > 1} 


where L, a random variable representing the lifetime of a bulb, is such that 
P{L = i} = p;. Also, 


Pig, =1-—Piy 


Suppose now that this chain has been in operation for a long (in theory, an infi- 
nite) time and consider the sequence of states going backward in time. Since, in 
the forward direction, the state is always increasing by 1 until it reaches the age 
at which the item fails, it is easy to see that the reverse chain will always decrease 
by 1 until it reaches 1 and then it will jump to a random value representing the 
lifetime of the (in real time) previous bulb. Thus, it seems that the reverse chain 
should have transition probabilities given by 


Oini-1=1, i>t 
OQvi=pi, 121 
To check this, and at the same time determine the stationary probabilities, we 


must see if we can find, with the O;; as previously given, positive numbers {z;} 
such that 


mPij = 1) Q),i 

To begin, let j = 1 and consider the resulting equations: 
mjPi1 = 1™1Q1, 

This is equivalent to 


P{L = i} 


ee Spies 
Ta a 
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or 
am; = 71,P{L > i} 


Summing over all i yields 
1= Dae =m Dit i} = m1 E[L] 


and so, for the preceding Qj; to represent the reverse transition probabilities, it 
is necessary for the stationary probabilities to be 


PIL >} 
a ee 


To finish the proof that the reverse transition probabilities and stationary prob- 
abilities are as given, all that remains is to show that they satisfy 


WP iis = W101 


which is equivalent to 


P{L >i} ( P{L=i}\  P{L>i+1} 
EI] PIL? te FIL] 


and which is true since P{L > 1} — P{L = 1} = P{L >i + 1}. |_| 


4.9 Markov Chain Monte Carlo Methods 


Let X be a discrete random vector whose set of possible values is x;,j > 1. Let 
the probability mass function of X be given by P{X = x;},j > 1, and suppose 
that we are interested in calculating 


E[h(X)] = 3 h(xj)P{X = x;} 


j=1 


for some specified function h. In situations where it is computationally difficult to 
evaluate the function h(x;), 7 > 1, we often turn to simulation to approximate 6. 
The usual approach, called Monte Carlo simulation, is to use random numbers 
to generate a partial sequence of independent and identically distributed random 
vectors Xj, X2,..., X» having the mass function P{X = x;},j > 1 (see Chapter 11 
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for a discussion as to how this can be accomplished). Since the strong law of large 
numbers yields 


im. SS BOD =6 (4.30) 


it follows that we can estimate @ by letting 7 be large and using the average of 
the values of h(X;),i = 1,..., as the estimator. 

It often, however, turns out that it is difficult to generate a random vector 
having the specified probability mass function, particularly if X is a vector of 
dependent random variables. In addition, its probability mass function is some- 
times given in the form P{X = x;} = Cb;,j > 1, where the bj are specified, but 
C must be computed, and in many applications it is not computationally feasi- 
ble to sum the b; so as to determine C. Fortunately, however, there is another 
way of using simulation to estimate @ in these situations. It works by generat- 
ing a sequence, not of independent random vectors, but of the successive states 
of a vector-valued Markov chain X1, X2,... whose stationary probabilities are 
P{X = xj}, 7 > 1. If this can be accomplished, then it would follow from Propo- 
sition 4.4 that Equation (4.30) remains valid, implying that we can then use 
yo A(X;)/7 as an estimator of 6. 

We now show how to generate a Markov chain with arbitrary stationary prob- 
abilities that may only be specified up to a multiplicative constant. Let b(/), 
j = 1,2,... be positive numbers whose sum B = De b(j) is finite. The fol- 
lowing, known as the Hastings—Metropolis algorithm, can be used to generate a 
time reversible Markov chain whose stationary probabilities are 


m(j) = b(j)/B, j=1,2,... 


To begin, let Q be any specified irreducible Markov transition probability matrix 
on the integers, with q(i,/) representing the row i column j element of Q. Now 
define a Markov chain {X,,, 7 > 0} as follows. When X, = i, generate a random 
variable Y such that P{Y = j} = q(i,/), 7 =1,2,....If Y =/, then set X,,41 equal 
to j with probability a(é,/), and set it equal to i with probability 1 — a(, /). Under 
these conditions, it is easy to see that the sequence of states constitutes a Markov 
chain with transition probabilities P;; given by 


Pig =Qfati,f), if j Ai 


Pis = 90,1) + ¥) ak) — ai, k)) 
k#i 


This Markov chain will be time reversible and have stationary probabilities z(/) if 


m(i)P;; = n(G)Pj; forj Ai 
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which is equivalent to 


Mi)q(t, Nai, ]) = M)qG, al, 1) (4.31) 


But if we take 2; = b(j)/B and set 


(4.32) 


imate (ee i) : 


mi)qUi,j)’ 
then Equation (4.31) is easily seen to be satisfied. For if 


wij) = TDAH 
oY Taq 


then a(j, i) = 1 and Equation (4.31) follows, and if a(i,) = 1 then 


wes’) = OIG) 
DAG 


and again Equation (4.31) holds, thus showing that the Markov chain is time 
reversible with stationary probabilities z(j). Also, since 2(j)=b(s)/B, we see 
from (4.32) that 


o(i,) = min (tee » 1) 
b@)qG,/) 

which shows that the value of B is not needed to define the Markov chain, because 

the values b(/) suffice. Also, it is almost always the case that z(j),j > 1 will not 

only be stationary probabilities but will also be limiting probabilities. (Indeed, a 

sufficient condition is that Pj; > 0 for some i.) 


Example 4.39 Suppose that we want to generate a uniformly distributed element 
in Y, the set of all permutations (x1,..., xy) of the numbers (1,...,7) for which 
Yj=/*; > 4 fora given constant a. To utilize the Hastings~Metropolis algorithm 
we need to define an irreducible Markov transition probability matrix on the state 
space .Y. To accomplish this, we first define a concept of “neighboring” elements 
of .Y, and then construct a graph whose vertex set is .7%. We start by putting an 
arc between each pair of neighboring elements in .”, where any two permutations 
in Y are said to be neighbors if one results from an interchange of two of the 
positions of the other. That is, (1, 2, 3, 4) and (1, 2, 4, 3) are neighbors whereas 
(1, 2, 3, 4) and (1, 3, 4, 2) are not. Now, define the g transition probability 
function as follows. With N(s) defined as the set of neighbors of s, and |N(s)| 
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equal to the number of elements in the set N(s), let 


Cee if t € N(s) 
INTO) 


That is, the candidate next state from s is equally likely to be any of its neighbors. 
Since the desired limiting probabilities of the Markov chain are z(s) = C, it 
follows that 2(s) = m(t), and so 


a(s,t) = min(|N(s)|/|N(t)|, 1) 


That is, if the present state of the Markov chain is s then one of its neighbors 
is randomly chosen, say, t. If t is a state with fewer neighbors than s (in graph 
theory language, if the degree of vertex t is less than that of vertex s), then the 
next state is t. If not, a uniform (0,1) random number U is generated and the 
next state is t if U < |(N(s)|/|N(t)| and is s otherwise. The limiting probabilities 
of this Markov chain are m(s) = 1/|.7|, where |.7| is the (unknown) number of 
permutations in .%. a 


The most widely used version of the Hastings—Metropolis algorithm is the 
Gibbs sampler. Let X = (X1,..., Xn) be a discrete random vector with proba- 
bility mass function p(x) that is only specified up to a multiplicative constant, 
and suppose that we want to generate a random vector whose distribution is that 
of X. That is, we want to generate a random vector having mass function 


p(x) = Cg(x) 
where g(x) is known, but C is not. Utilization of the Gibbs sampler assumes that 


for any i and values x;,j7 ~ 7, we can generate a random variable X having the 
y jot g' & 
probability mass function 


P{X = x} = P(X; =x|X) =xj,j #3} 


It operates by using the Hasting—Metropolis algorithm on a Markov chain with 
states x=(x1,...,Xn), and with transition probabilities defined as follows. 
Whenever the present state is x, a coordinate that is equally likely to be any 
of 1,...,7 is chosen. If coordinate i is chosen, then a random variable X with 
probability mass function P{X = x} = P{X; = x|X; = xj,j & i} is generated. If 
X =x, then the state y= (x1,...Xj-1,X,Xi41,.--,Xn) is considered as the candi- 
date next state. In other words, with x and y as given, the Gibbs sampler uses the 
Hastings—Metropolis algorithm with 


ply) 
nP{X; = xj,f #3} 


1 eee 
q(x, y) = Pee = x|X; = Xj,] # i} = 
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Because we want the limiting mass function to be p, we see from Equation (4.32) 
that the vector y is then accepted as the new state with probability 


(Para x) ) 
fal ee 
p(x)q, y) 


(eee ) 
rr (pie aaa 
p(x)pty) 
= 


Hence, when utilizing the Gibbs sampler, the candidate state is always accepted 
as the next state of the chain. 


a(x, y) = 


Example 4.40 Suppose that we want to generate ” uniformly distributed points 
in the circle of radius 1 centered at the origin, conditional on the event that 
no two points are within a distance d of each other, when the probability of this 
conditioning event is small. This can be accomplished by using the Gibbs sampler 
as follows. Start with any 7 points x1,...,Xn in the circle that have the property 
that no two of them are within d of the other; then generate the value of I, equally 
likely to be any of the values 1,...,. Then continually generate a random point 
in the circle until you obtain one that is not within d of any of the other n — 1 
points excluding x;. At this point, replace x; by the generated point and then 
repeat the operation. After a large number of iterations of this algorithm, the set 
of n points will approximately have the desired distribution. a 


Example 4.41 Let X;,i = 1,...,7, be independent exponential random variables 
with respective rates Aj,i = 1,...,”. Let S = )“_, Xj, and suppose that we want 
to generate the random vector X = (Xj,..., Xn), conditional on the event that 
S > c for some large positive constant c. That is, we want to generate the value 
of a random vector whose density function is 


n n 
(nicy= seg se re 0, Dx s¢ 
i=1 i=1 
This is easily accomplished by starting with an initial vector x = (x1,...,Xn) 
satisfying x; > 0,1 = 1,...,”, )-., x; > c. Then generate a random variable I 
that is equally likely to be any of 1,...,7. Next, generate an exponential random 
variable X with rate 4; conditional on the event that X + 7iz;x; > c. This 
latter step, which calls for generating the value of an exponential random variable 
given that it exceeds c — )),,; xj, is easily accomplished by using the fact that an 
exponential conditioned to be greater than a positive constant is distributed as 
the constant plus the exponential. Consequently, to obtain X, first generate an 
exponential random variable Y with rate 4;, and then set 
+ 


X=Y+ c- ox; 
jAl 


where at = max(a, 0). 
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The value of x; should then be reset as X and a new iteration of the algorithm 
begun. | 


Remark As can be seen by Examples 4.40 and 4.41, although the theory for the 
Gibbs sampler was represented under the assumption that the distribution to be 
generated was discrete, it also holds when this distribution is continuous. 


4.10 Markov Decision Processes 


Consider a process that is observed at discrete time points to be in any one of 
M possible states, which we number by 1,2,...,M. After observing the state of 
the process, an action must be chosen, and we let A, assumed finite, denote the 
set of all possible actions. 

If the process is in state i at time 1 and action a is chosen, then the next state 
of the system is determined according to the transition probabilities P;(a). If we 
let X,, denote the state of the process at time 7 and a, the action chosen at time 
n, then the preceding is equivalent to stating that 


P{Xn+1 = j|X0, 40, X1,41,---,Xn =1,4, =as = Pi (a) 


Thus, the transition probabilities are functions only of the present state and the 
subsequent action. 

By a policy, we mean a rule for choosing actions. We shall restrict our- 
selves to policies that are of the form that the action they prescribe at any time 
depends only on the state of the process at that time (and not on any informa- 
tion concerning prior states and actions). However, we shall allow the policy 
to be “randomized” in that its instructions may be to choose actions accord- 
ing to a probability distribution. In other words, a policy B is a set of numbers 
B = {Bi(a),a € Aji = 1,...,M} with the interpretation that if the process is 
in state i, then action a is to be chosen with probability 6;(a). Of course, we 
need have 


0< Ba) <1, forallija 
Y Bila) =1, foralli 
a 


Under any given policy B, the sequence of states {X,,, 1 = 0,1,...} constitutes 
a Markov chain with transition probabilities Pj(B) given by 


Pi(B) = Pg{Xn41 = f|Xn = i}* 
= )o Pi@Bila) 


* We use the notation Pg to signify that the probability is conditional on the fact that policy B is used. 
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where the last equality follows by conditioning on the action chosen when in 
state i. Let us suppose that for every choice of a policy B, the resultant Markov 
chain {X,, 1 =0,1,...} is ergodic. 

For any policy £, let 2; denote the limiting (or steady-state) probability that 
the process will be in state i and action a will be chosen if policy B is employed. 
That is, 


Nig = im Pg{Xn = i, an = a} 


The vector 2 = (jg) must satisfy 
(i) iq > O for all i, a, 
(ii) 0; 0g Mia = 1, 
(iii) S24 ja = 2) oq MiaPi(@) for all j (4.33) 


Equations (i) and (ii) are obvious, and Equation (iii), which is an analogue of 
Equation (4.7), follows as the left-hand side equals the steady-state probability 
of being in state j and the right-hand side is the same probability computed by 
conditioning on the state and action chosen one stage earlier. 

Thus for any policy B, there is a vector m = (zg) that satisfies (i)—(iii) and with 
the interpretation that zjq is equal to the steady-state probability of being in state i 
and choosing action a when policy B is employed. Moreover, it turns out that the 
reverse is also true. Namely, for any vector x = (zg) that satisfies (i)—-(iii), there 
exists a policy B such that if B is used, then the steady-state probability of being 
in i and choosing action a equals zjg. To verify this last statement, suppose that 
1 = (jg) is a vector that satisfies (i)-(iii). Then, let the policy B = (B;(a)) be 


B;(a) = P{B chooses alstate is i} 
= Tia 

a Tia 
Now let Pjz denote the limiting probability of being in i and choosing a when 
policy B is employed. We need to show that Piz = jg. To do so, first note that 
{Pig,i = 1,...,M, a € A} are the limiting probabilities of the two-dimensional 
Markov chain: (Xn, an),n > 0}. Hence, by the fundamental Theorem 4.1, they 
are the unique solution of 

(i) Pia > 0, 

(i!) 0; 0a Pia = 1, 
(iii!) Pig = Oj Lea Pia Pi (a) B)(@ 


where (iii’) follows since 


P{Xn41 = f,4n41 = a|Xn = i, an = a} = Py’) Bj (A) 


Because 


Bj@) = = - 
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we see that (Pjg) is the unique solution of 


Pia > 0, 


yD Pia = 1, 

1 a 
P= P,gP;(a)— 2 
ja ud ia US ag 


Hence, to show that Pj, = jg, we need show that 


Nia 2 0, 


SY 
1 a 
= peg 
a 2 X nie a 2S yy Tia 


i 


The top two equations follow from (i) and (ii) of Equation (4.33), and the third, 
which is equivalent to 


eS) ere) 
a d 


i 


follows from condition (iii) of Equation (4.33). 

Thus we have shown that a vector B = (zjq) will satisfy (i), (ii), and (iii) of 
Equation (4.33) if and only if there exists a policy B such that zjq is equal to the 
steady-state probability of being in state i and choosing action a when B is used. 
In fact, the policy B is defined by B;(a) = mia/)°, Tia: 

The preceding is quite important in the determination of “optimal” policies. 
For instance, suppose that a reward R(i, a) is earned whenever action a is chosen 
in state i. Since R(X;,a;) would then represent the reward earned at time i, the 
expected average reward per unit time under policy B can be expressed as 


n 


i] 


expected average reward under B = lim | Ep 
n> 


Now, if zjq denotes the steady-state probability of being in state i and choosing 
action a, it follows that the limiting expected reward at time 1 equals 


dim EIR (Xn, 4n)] = D9 YD) tiaRG a) 


i 
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which implies that 


expected average reward under B = a s TigR(i, a) 


1 a 


Hence, the problem of determining the policy that maximizes the expected aver- 
age reward is 


maximize bee: iaR(, a) 


1=(Tia) ia 


subject to mj, >0, for all i, a, 


a Ss Nig = 1, 
Yo ja= >) >> miaPi(@), for all j (4.34) 


1 


However, the preceding maximization problem is a special case of what is known 
as a linear program and can be solved by a standard linear programming algo- 
rithm known as the simplex algorithm.” If B* = (x) maximizes the preceding, 
then the optimal policy will be given by B* where 


* 


* = Tia 
ee See 


Remarks 


(i) It can be shown that there is a 7* maximizing Equation (4.34) that has the property 
that for each i, 2%, is zero for all but one value of a, which implies that the opti- 
mal policy is nonrandomized. That is, the action it prescribes when in state i is a 
deterministic function of i. 

(ii) The linear programming formulation also often works when there are restrictions 
placed on the class of allowable policies. For instance, suppose there is a restriction 
on the fraction of time the process spends in some state, say, state 1. Specifically, 
suppose that we are allowed to consider only policies having the property that their 
use results in the process being in state 1 less than 100q percent of time. To determine 
the optimal policy subject to this requirement, we add to the linear programming 
problem the additional constraint 


> Ma Qa 
a 
since }", 714 represents the proportion of time that the process is in state 1. 


* Tt is called a linear program since the objective function )>; }°, R(i,a)zjiq and the constraints are 
all linear functions of the zjq. For a heuristic analysis of the simplex algorithm, see 4.5.2. 
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4.11 Hidden Markov Chains 


Let {X;,, 2 = 1,2,...} be a Markov chain with transition probabilities P;; and 
initial state probabilities p; = P{X, = i}, i > 0. Suppose that there is a finite set 
Y of signals, and that a signal from .Y is emitted each time the Markov chain 
enters a state. Further, suppose that when the Markov chain enters state j then, 
independently of previous Markov chain states and signals, the signal emitted is 
s with probability p(s|j), )\.<.7 p(s|/) = 1. That is, if S,, represents the mth signal 
emitted, then 


P{S) = s|X1 = 7} = pO), 
P{Sp = $|X1, $1, ---5Xn—15 Sn—15 Xn = J} = psi) 


A model of the preceding type in which the sequence of signals $1, S2,... is 
observed, while the sequence of underlying Markov chain states 1, X2,... is 
unobserved, is called a hidden Markov chain model. 


Example 4.42 Consider a production process that in each period is either in a 
good state (state 1) or in a poor state (state 2). If the process is in state 1 during 
a period then, independent of the past, with probability 0.9 it will be in state 
1 during the next period and with probability 0.1 it will be in state 2. Once in 
state 2, it remains in that state forever. Suppose that a single item is produced each 
period and that each item produced when the process is in state 1 is of acceptable 
quality with probability 0.99, while each item produced when the process is in 
state 2 is of acceptable quality with probability 0.96. 

If the status, either acceptable or unacceptable, of each successive item is 
observed, while the process states are unobservable, then the preceding is a hid- 
den Markov chain model. The signal is the status of the item produced, and has 
value either a or u, depending on whether the item is acceptable or unacceptable. 
The signal probabilities are 


p(ul1) = 0.01, p(al1) = 0.99, 
p(u\2) = 0.04, p(al2) = 0.96 


while the transition probabilities of the underlying Markov chain are 
Pig =0.9=1- Piz, Pap=d 2 


Although {S,,,2 > 1}is nota Markov chain, it should be noted that, conditional 
onthe current state X,,, the sequence S,,, Xn+1, Sn41,--- of future signals and states 
is independent of the sequence X1,$1,...,; Xn—1, Sy_1 of past states and signals. 

Let S”=(S1,...,S,) be the random vector of the first 1 signals. For a 
fixed sequence of signals s1,...,Sy, let s, =(s1,...,5%), R<n. To begin, let us 
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determine the conditional probability of the Markov chain state at time ” given 
that $” = s,. To obtain this probability, let 


and note that 
P{S” = sy, Xn = 1} 
P{S” = sy} 


_ Fa) 
Li Fa@ 


Now, 
Fy) = Pst = Sy—15 Sn = Sn Xn = J} 


and SORISe = Sn—1,Xn—-1 aa 1, Xp = Is Sn = Sy} 
i 


= Bi @P Xa = J8e Sols = Sy Ma = 


1 


= Ra OP Kn = 180s SalXnt =F} 


1 


= >) Fy-1@Pijp(sulf) 
= p(snli) ¥> Fn1@Pij (4.35) 


where the preceding used that 


P{Xn =, Sn = Sn|Xn—1 = i} 
= P{Xn = j|Xn-1 = 1} X P{Sp = Sn|Xn = js Xn-1 = 1} 
= Pi j;P{Sn = Sn|Xn = j} 
= Pi ip Snls) 


Starting with 


Fy (i) = P{X, = 1,81 = 51} = pip(sila) 


we can use Equation (4.35) to recursively determine the functions F (i), 
F3(4),..., up to F,,(2). 
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Example 4.43 Suppose in Example 4.42 that P{X; = 1} = 0.8. It is given that 
the successive conditions of the first three items produced are a, u, a. 


(a) What is the probability that the process was in its good state when the third item was 
produced? 

(b) What is the probability that X4 is 1? 

(c) What is the probability that the next item produced is acceptable? 


Solution: With s3 = (a,u,a), we have 
F,(1) = (0.8)(0.99) = 0.792, 
F,(2) = (0.2)(0.96) = 0.192 
F)(1) = 0.01[0.792(0.9) + 0.192(0)] = 0.007128, 
Fy(2) = 0.04[0.792(0.1) + 0.192(1)] = 0.010848 
F3(1) = 0.99[(0.007128)(0.9)] + 0.006351, 
F3(2) = 0.96[(0.007128)(0.1) + 0.010848] + 0.011098 


Therefore, the answer to part (a) is 


0.006351 
P{X3=1 ~ 0.364 
{X3 = ss} * 9996351 4 0.011098 © 2° 


To compute P{X4 = 1|s3}, condition on X3 to obtain 


P{X4 = 1\s3} = P{X4 = 1|X3 = 1,s3}P{X3 = 1|s3} 
+ P{X4 = 1|X3 = 2,53}P{X3 = 2|s3} 
= Pixy = 11s = 1,831(0:964) 4 Py = Ry = 2 5510.6 36) 
= 0.364P11 + 0.636P2,1 
= 0.3276 


To compute P{S4 = a|s3}, condition on X4 to obtain 


P{S4 = als3} = P{S4 = a|X4 = 1,83}P{X4 = 1|s3} 
+ P{S4 = a|X4 = 2,83}P{X4 = 2|s3} 
= P{$4 = a|X4 = 1}(0.3276) + P{$4 = a|X4 = 2}(1 — 0.3276) 
= (0.99)(0.3276) + (0.96)(0.6724) = 0.9698 a 


To compute P{S” = s,}, use the identity P{S” = s,} = >°; F,(é) along with 
Equation (4.35). If there are N states of the Markov chain, this requires com- 
puting ”N quantities F,,(i), with each computation requiring a summation over 
N terms. This can be compared with a computation of P{S” = s,} based on 
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conditioning on the first 7 states of the Markov chain to obtain 


PS" = sp} = YY PIS" = Sul X1 = iy...) Xn = in}P(X1 = ity...) Xn = in} 
11 5---52n 
= oa P(s1lt1) ++ P(Saltn) Diy Piy jin Pinis ++ * Pint sin 
i1,.--5in 
The use of the preceding identity to compute P{S” = s,} would thus require a 
summation over N” terms, with each term being a product of 27 values, indicating 
that it is not competitive with the previous approach. 
The computation of P{S” = s,} by recursively determining the functions F, (i) 
is known as the forward approach. There also is a backward approach, which is 
based on the quantities Bz, (i), defined by 


Be = P{Sp44 = Skt1o-- ->Sn = Sn|Xp = 1 


A recursive formula for Bz (i) can be obtained by conditioning on Xz, 1. 


Be@ = D> PiSei1 = Sets -+->Sn = SulXp = i, Xeg1 = fPP(Xeg1 = i1Xe = A) 


J 
= yea 1 = Sp dose e5 On = Snl|Xpu1 = }Pij 
j 


= Do PiSe41 = SeyilXey1 =i) 
j 


x P{Spio = Spz25-- +5 Sn = SnlSpzt = Seo, Xb41 = f}Pi, 


= Di plsesslP{Sp+2 = Se425---9 Sn = SalXeq1 = APiy 
j 


=) pserilBeri GPa (4.36) 
j 


Starting with 
By-1@) = P{Sy = Sn|Xn-1 = i} 
=) > Pijp(suli) 
j 


we would then use Equation (4.36) to determine the function B,_2(i), then 
B,—3(), and so on, down to B,(i). This would then yield P{S” = s,,} via 


PIS? Sah = PS = Sieve oS = aap: 


= >> P(Sy = s1|X1= P(S2 =.525..+5Sn = SalS1 = 81, X1 = Ap; 


1 
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= > p(sil)P(S2 = s2,..-)Sn = SulX1 = ip 


= D> p(sil) Bi @pi 


Another approach to obtaining P{S” = s,} is to combine both the forward and 
backward approaches. Suppose that for some k we have computed both functions 
F,(j) and B,(j). Because 


PS ty Kaper as, ke =7 
x P{Spat = Skyts+-+ySn = SnlS* = sp, Xp = j} 
= P{S* = sp, Xp =f}P{Sput = Seats. +> Sn = SnlXe =f} 
= Fe(f) Bey) 


we see that 


P{S" = Su} = D> Fe (A Beli) 


i 


The beauty of using the preceding identity to determine P{S” = s,} is that we 
may simultaneously compute the sequence of forward functions, starting with 
F,, as well as the sequence of backward functions, starting at B,_1. The parallel 
computations can then be stopped once we have computed both Fy, and By for 
some k. 


4.11.1 Predicting the States 


Suppose the first 7 observed signals are s, = (s1,...,$), and that given this data 
we want to predict the first 7 states of the Markov chain. The best predictor 
depends on what we are trying to accomplish. If our objective is to maximize the 
expected number of states that are correctly predicted, then for each k =1,...,” 
we need to compute P{X, = j|S” = s,} and then let the value of j that maximizes 
this quantity be the predictor of X,. (That is, we take the mode of the conditional 
probability mass function of Xz, given the sequence of signals, as the predictor of 
X,.) To do so, we must first compute this conditional probability mass function, 
which is accomplished as follows. For k < n, 


P{S” = Sy, Xp = j} 
P{S” = sy} 
_ Fe Bali) 
Fe GBRG) 


P{Xp = j|S" = Sn} = 
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Thus, given that S” =s,,, the optimal predictor of X, is the value of j that maxi- 
mizes F,(7j)Bz(j). 

A different variant of the prediction problem arises when we regard the 
sequence of states as a single entity. In this situation, our objective is to choose 
that sequence of states whose conditional probability, given the sequence of sig- 
nals, is maximal. For instance, in signal processing, while X1,...,X;, might be 
the actual message sent, S1,..., S,, would be what is received, and so the objective 
would be to predict the actual message in its entirety. 

Letting X, = (X1,...,X,) be the vector of the first k states, the problem 
of interest is to find the sequence of states i1,...,i, that maximizes P{X, = 
(14, .--5%n)|S” = Sy}. Because 


P{Xy a (41,..-5%), 8” a Sn} 


PIX = Gye te) SoS Sah = P{S” = s,} 


this is equivalent to finding the sequence of states i1,..., i, that maximizes P{X,, = 
(11,.+-5%n), S” = Sy}. 
To solve the preceding problem let, for k < n, 


Ve(j) =. max P{Xg_1 = (it5---5 ig_1), Xe = fy S* = sp} 


Moret k—1 
To recursively solve for V,(j), use that 


Vai) = max max P(X p 2 = (i1y---sik-2)Xp1 = i, Xp =, = 54) 
L5-+924R—2 


ase P(Xp_2 = (ity ---sig-2)oXp_1 = 4S" 1 = 5p, 
Xp = J, Sp = Sp} 
= max amas PU = (ity. 5tp-2)) Xp = 4,8"! = sp_4} 
x P{Xp = 7,S¢ = selXp_2 = (1s --- 5 ie-2), Xp_-1 = 4,8") = 54) 
=max max P(Xp2 = (i1y---sik-2)» Xe = 4 Sees 
Lo-+94R—-2 


x P{X, = 7, Sp = Sp |Xp_1 = 3} 
= max P{X, = j, Sp = sp|Xg_1 = 1} 
1 


= Spi} 


‘ ‘ ‘ =f 
cicimasc PUK y= Gye veyip-5), eg S48 | Sieg} 
BM y-++52k-2 


max Pi jP(Sel) Ve-1@ 
= p(sglf) max Pig Ve-1@) (4.37) 
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Starting with 


ViQ) = P(X = 7,51 = 51} = pjpilp 


we now use the recursive identity (4.37) to determine V2(/) for each j; then V3(/) 
for each j; and so on, up to V,,(/) for each j. 

To obtain the maximizing sequence of states, we work in the reverse direction. 
Let j, be the value (or any of the values if there are more than one) of j that 
maximizes V,,(j). Thus /, is the final state of a maximizing state sequence. Also, 
for k < n, let ig(j) be a value of i that maximizes Pj; V;, (i). Then 


max P{X;, i (41, i -sIn)s S” = Sn} 


M1 5--5!n 


= max V;,(/) 
J 


= max P{Xy = (i1,..-5in-15Jn)58” = Sn} 


11 5+++9ln—-1 


= P(Sn\fn) ae Pi, Vn-1@) 
= P(Snlin)Pin_4(in)yin Vn—1 Gn—1 Gn) 


Thus, in—1(jn) is the next to last state of the maximizing sequence. Continuing 
in this manner, the second from the last state of the maximizing sequence is 
in—2(in—1(jn)), and so on. 

The preceding approach to finding the most likely sequence of states given a 
prescribed sequence of signals is known as the Viterbi Algorithm. 


Exercises 


*1. Three white and three black balls are distributed in two urns in such a way that 
each contains three balls. We say that the system is in state i, i = 0,1, 2, 3, if the first 
urn contains i white balls. At each step, we draw one ball from each urn and place 
the ball drawn from the first urn into the second, and conversely with the ball from 
the second urn. Let X,, denote the state of the system after the mth step. Explain 
why {X,,2 = 0,1,2,...} isa Markov chain and calculate its transition probability 
matrix. 


2. Suppose that whether or not it rains today depends on previous weather conditions 
through the last three days. Show how this system may be analyzed by using a 
Markov chain. How many states are needed? 


3. In Exercise 2, suppose that if it has rained for the past three days, then it will rain 
today with probability 0.8; if it did not rain for any of the past three days, then it 
will rain today with probability 0.2; and in any other case the weather today will, 
with probability 0.6, be the same as the weather yesterday. Determine P for this 
Markov chain. 
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10. 


11. 


Consider a process {Xn, 7 = 0,1,...}, which takes on the values 0, 1, or 2. Suppose 


P{Xn py = Xn = 1, Xn—-1 = In-1,---,X0 = io} 
Pi, when 71 is even 
Pi, when 7 is odd 
where Ye Pi, = DjoPt = 1,7=0,1,2. Is {X,,n > 0} a Markov chain? If not, 
then show how, by enlarging the state space, we may transform it into a Markov 
chain. 


A Markov chain {X;, 1 > 0} with states 0, 1, 2, has the transition probability matrix 


NR Oo NR 
SO We We 
NIB WIN Ae 


If P{Xo = 0} = P{Xo = 1} = §, find E[X3]. 


Let the transition probability matrix of a two-state Markov chain be given, as in 
Example 4.2, by 


fa 
1—p p 


Show by mathematical induction that 


p™ = 


3+ 35(2p-1)"  4-32p-1)" 
3-302p-)" 44+ 52p-1)” 


In Example 4.4 suppose that it has rained neither yesterday nor the day before 
yesterday. What is the probability that it will rain tomorrow? 


Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2 has prob- 
ability 0.6 of coming up heads. If the coin flipped today comes up heads, then we 
select coin 1 to flip tomorrow, and if it comes up tails, then we select coin 2 to flip 
tomorrow. If the coin initially flipped is equally likely to be coin 1 or coin 2, then 
what is the probability that the coin flipped on the third day after the initial flip 
is coin 1? Suppose that the coin flipped on Monday comes up heads. What is the 
probability that the coin flipped on Friday of the same week also comes up heads? 


If in Example 4.10 we had defined X;, to equal 1 if the mth selection were red and 
to equal 0 if it were blue, would X,,,7 > 1 be a Markov chain? 

In Example 4.3, Gary is currently in a cheerful mood. What is the probability that 
he is not in a glum mood on any of the following three days? 

In Example 4.3, Gary was in a glum mood four days ago. Given that he hasn’t felt 
cheerful in a week, what is the probability he is feeling glum today? 
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12. 


13. 


14. 


15: 


*16. 


17. 


For a Markov chain {X,,” > 0} with transition probabilities P;;, consider the 
conditional probability that X,, = m given that the chain started at time 0 in state 
i and has not yet entered state r by time n, where r is a specified state not equal 
to either i or m. We are interested in whether this conditional probability is equal 
to the 7 stage transition probability of a Markov chain whose state space does not 
include state r and whose transition probabilities are 


Ope 93 
ty 1 = P;,’ oJ 


Either prove the equality 


P{X, =m|Xo =i, X, #17,k = 1,...,n} = OF 


i,m 


or construct a counterexample. 


Let P be the transition probability matrix of a Markov chain. Argue that if for some 
positive integer r, P” has all positive entries, then so does P”, for all integers n > r. 


Specify the classes of the following Markov chains, and determine whether they are 
transient or recurrent: 


eA 0 0 0 1 
o 4 4: 
i 0. 0" Oe A 
Pr=|> OO 5], PE ae iy cae ag? 
1 1 9g i. 
a 0.0: tO 
1 1 
> 0 5 0 0 4 7 0 O O 
4 2 4.0 0 $ $0 0 0 
P=/5 0 5 O Of, Pa=]0 0 1 0 O 
00 0 4 § 0 0 4 % 0 
0 0 0 5 § 1 Oh 0). 0 


Prove that if the number of states in a Markov chain is M, and if state j can be 
reached from state i, then it can be reached in M steps or less. 


Show that if state 7 is recurrent and state i does not communicate with state /, 
then Pj; = 0. This implies that once a process enters a recurrent class of states it 
can never leave that class. For this reason, a recurrent class is often referred to as a 
closed class. 


For the random walk of Example 4.18 use the strong law of large numbers to give 
another proof that the Markov chain is transient when p 4 7 

Hint: Note that the state at time 7 can be written as )“"_, Y; where the Yjs are 
independent and P{Y; = 1} = p = 1 — P{Y; = —1}. Argue that if p > - then, by 
the strong law of large numbers, }°7 Y; > 00 as n > oo and hence the initial state 
0 can be visited only finitely often, and hence must be transient. A similar argument 
holds when p < 3 
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18. 


19. 
20. 


urd bs 


22. 


23. 


24. 


Coin 1 comes up heads with probability 0.6 and coin 2 with probability 0.5. A coin 

is continually flipped until it comes up tails, at which time that coin is put aside and 

we start flipping the other one. 

(a) What proportion of flips use coin 1? 

(b) If we start the process with coin 1 what is the probability that coin 2 is used 
on the fifth flip? 


For Example 4.4, calculate the proportion of days that it rains. 


A transition probability matrix P is said to be doubly stochastic if the sum over 
each column equals one; that is, 


oS Pi =1, for all; 


If such a chain is irreducible and aperiodic and consists of M + 1 states 0,1,..., M, 
show that the limiting probabilities are given by 


j=0,1,...,M 


A DNA nucleotide has any of four values. A standard model for a mutational 
change of the nucleotide at a specific location is a Markov chain model that 
supposes that in going from period to period the nucleotide does not change with 
probability 1 — 3a, and if it does change then it is equally likely to change to any 
of the other three values, for some 0 < a < ; 


(a) Show that P?, = 44+ 31 — 4a)”. 
(b) What is the long-run proportion of time the chain is in each state? 


Let Y, be the sum of 7 independent rolls of a fair die. Find 


lim P{Y, is a multiple of 13} 
n—-> Ooo 


Hint: Define an appropriate Markov chain and apply the results of Exercise 20. 


In a good weather year the number of storms is Poisson distributed with mean 1; in 

a bad year it is Poisson distributed with mean 3. Suppose that any year’s weather 

conditions depends on past years only through the previous year’s condition. Sup- 

pose that a good year is equally likely to be followed by either a good or a bad year, 

and that a bad year is twice as likely to be followed by a bad year as by a good 

year. Suppose that last year—call it year 0—was a good year. 

(a) Find the expected total number of storms in the next two years (that is, in 
years 1 and 2). 

(b) Find the probability there are no storms in year 3. 

(c) Find the long-run average number of storms per year. 


Consider three urns, one colored red, one white, and one blue. The red urn contains 
1 red and 4 blue balls; the white urn contains 3 white balls, 2 red balls, and 2 blue 
balls; the blue urn contains 4 white balls, 3 red balls, and 2 blue balls. At the initial 
stage, a ball is randomly selected from the red urn and then returned to that urn. 
At every subsequent stage, a ball is randomly selected from the urn whose color is 
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25; 


26. 


2 


28. 


29. 


30. 


31. 


the same as that of the ball previously selected and is then returned to that urn. In 
the long run, what proportion of the selected balls are red? What proportion are 
white? What proportion are blue? 


Each morning an individual leaves his house and goes for a run. He is equally likely 
to leave either from his front or back door. Upon leaving the house, he chooses a 
pair of running shoes (or goes running barefoot if there are no shoes at the door 
from which he departed). On his return he is equally likely to enter, and leave his 
running shoes, either by the front or back door. If he owns a total of k pairs of 
running shoes, what proportion of the time does he run barefooted? 


Consider the following approach to shuffling a deck of 7 cards. Starting with any 
initial ordering of the cards, one of the numbers 1,2,...,7 is randomly chosen in 
such a manner that each one is equally likely to be selected. If number i is chosen, 
then we take the card that is in position i and put it on top of the deck—that is, 
we put that card in position 1. We then repeatedly perform the same operation. 
Show that, in the limit, the deck is perfectly shuffled in the sense that the resultant 
ordering is equally likely to be any of the 7! possible orderings. 


Each individual in a population of size N is, in each period, either active or inactive. 
If an individual is active in a period then, independent of all else, that individual 
will be active in the next period with probability a. Similarly, if an individual is 
inactive in a period then, independent of all else, that individual will be inactive in 
the next period with probability 6. Let X,, denote the number of individuals that 
are active in period n. 

(a) Argue that X,,” > 0 is a Markov chain. 

(b) Find E[X,|Xo = il. 

(c) Derive an expression for its transition probabilities. 

(d) Find the long-run proportion of time that exactly j people are active. 


Hint for (d): Consider first the case where N = 1. 


Every time that the team wins a game, it wins its next game with probability 0.8; 
every time it loses a game, it wins its next game with probability 0.3. If the team 
wins a game, then it has dinner together with probability 0.7, whereas if the team 
loses then it has dinner together with probability 0.2. What proportion of games 
result in a team dinner? 

An organization has N employees where N is a large number. Each employee has 
one of three possible job classifications and changes classifications (independently) 
according to a Markov chain with transition probabilities 


0.7 0.22 0.1 
0.2 0.6 0.2 
0.1 0.4 0.5 


What percentage of employees are in each classification? 


Three out of every four trucks on the road are followed by a car, while only one 
out of every five cars is followed by a truck. What fraction of vehicles on the road 
are trucks? 


A certain town never has two sunny days in a row. Each day is classified as being 
either sunny, cloudy (but dry), or rainy. If it is sunny one day, then it is equally 
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33. 


34. 


35. 


36. 


likely to be either cloudy or rainy the next day. If it is rainy or cloudy one day, then 
there is one chance in two that it will be the same the next day, and if it changes 
then it is equally likely to be either of the other two possibilities. In the long run, 
what proportion of days are sunny? What proportion are cloudy? 


Each of two switches is either on or off during a day. On day n, each switch will 
independently be on with probability 


[1 + number of on switches during day 1 — 1]/4 


For instance, if both switches are on during day n — 1, then each will independently 
be on during day 1 with probability 3/4. What fraction of days are both switches 
on? What fraction are both off? 


A professor continually gives exams to her students. She can give three possi- 
ble types of exams, and her class is graded as either having done well or badly. 
Let p; denote the probability that the class does well on a type i exam, and sup- 
pose that py = 0.3, pz = 0.6, and p3 = 0.9. If the class does well on an exam, 
then the next exam is equally likely to be any of the three types. If the class does 
badly, then the next exam is always type 1. What proportion of exams are type 
i,4= 1,2,3? 

A flea moves around the vertices of a triangle in the following manner: Whenever 
it is at vertex 7 it moves to its clockwise neighbor vertex with probability p; and to 
the counterclockwise neighbor with probability g; = 1 — p;, i= 1,2, 3. 

(a) Find the proportion of time that the flea is at each of the vertices. 

(b) How often does the flea make a counterclockwise move that is then followed 

by five consecutive clockwise moves? 


Consider a Markov chain with states 0, 1, 2, 3, 4. Suppose Po,4 = 1; and suppose 
that when the chain is in state i,i > 0, the next state is equally likely to be any of 
the states 0,1,...,i— 1. Find the limiting probabilities of this Markov chain. 


The state of a process changes daily according to a two-state Markov chain. If the 
process is in state i during one day, then it is in state j the following day with prob- 
ability P;;, where 


Poo =0.4, Poi =0.6, Pip =0.2, Pi = 0.8 


Every day a message is sent. If the state of the Markov chain that day is i then 

the message sent is “good” with probability p; and is “bad” with probability 

qi =1—pii = 0,1 

(a) If the process is in state 0 on Monday, what is the probability that a good 
message is sent on Tuesday? 

(b) If the process is in state 0 on Monday, what is the probability that a good 
message is sent on Friday? 

(c) Inthe long run, what proportion of messages are good? 

(d) Let Y, equal 1 if a good message is sent on day 7 and let it equal 2 otherwise. 
Is {Yn,” > 1} a Markov chain? If so, give its transition probability matrix. If 
not, briefly explain why not. 
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38. 


39. 


40. 
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Show that the stationary probabilities for the Markov chain having transition prob- 
abilities P;; are also the stationary probabilities for the Markov chain whose tran- 
sition probabilities Q;; are given by 


k 
Qi, > be 


for any specified positive integer k. 


Recall that state i is said to be positive recurrent if mj; < oo, where m;,; is the 
expected number of transitions until the Markov chain, starting in state i, makes 
a transition back into that state. Because z;, the long-run proportion of time the 
Markov chain, starting in state i, spends in state i, satisfies 


it follows that state i is positive recurrent if and only if z; > 0. Suppose that state i 
is positive recurrent and that state i communicates with state j. Show that state / is 
also positive recurrent by arguing that there is an integer 7 such that 


mj > miP?, > 0 


Recall that a recurrent state that is not positive recurrent is called null recurrent. 
Use the result of Exercise 38 to prove that null recurrence is a class property. That 
is, if state 7 is null recurrent and state i communicates with state j, show that state 
j is also null recurrent. 


It follows from the argument made in Exercise 38 that state i is null recurrent if it 
is recurrent and 2; = 0. Consider the one-dimensional symmetric random walk of 
Example 4.18. 

(a) Argue that 2; = 7p for alli. 

(b) Argue that all states are null recurrent. 


Let z; denote the long-run proportion of time a given irreducible Markov chain is 
in state 7. 

(a) Explain why 7; is also the proportion of transitions that are into state i as well 
as being the proportion of transitions that are from state i. 

(b) 2;Pi represents the proportion of transitions that satisfy what property? 

(c) }°;2;Pi represent the proportion of transitions that satisfy what property? 
(d) Using the preceding explain why 


Tj = Ay 
i 


Let A be a set of states, and let A° be the remaining states. 
(a) What is the interpretation of 


> ys mj P iz? 


ie A jEeA® 
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(b) What is the interpretation of 

rar? 
icAC jEeA 
(c) Explain the identity 
Ey Ear 
ie A jEeAe ic Ac jeA 
43. Each day, one of 1 possible elements is requested, the ith one with probability 

Pj,i > 1,>°{P; = 1. These elements are at all times arranged in an ordered list that 

is revised as follows: The element selected is moved to the front of the list with the 

relative positions of all the other elements remaining unchanged. Define the state 
at any time to be the list ordering at that time and note that there are m! possible 
states. 

(a) Argue that the preceding is a Markov chain. 

(b) For any state i1,...,%, (which is a permutation of 1,2,...,7), let 7(i1,..., in) 
denote the limiting probability. In order for the state to be i1,..., in, it is 
necessary for the last request to be for 71, the last non-i; request for iz, the last 
non-i1 or 72 request for i3, and so on. Hence, it appears intuitive that 

Pj Pi Fie 
whe = P; x : tee nt 
W(t, 5¢n) 4q_ Pi, i P,, = P,, T P;, ae Pad 
Verify when 1 = 3 that the preceding are indeed the limiting probabilities. 
44. Suppose that a population consists of a fixed number, say, m, of genes in any 
generation. Each gene is one of two possible genetic types. If exactly i (of the ™) 

genes of any generation are of type 1, then the next generation will have j type 1 

(and m — j type 2) genes with probability 
(0) (2) (at) 

ji] \m m 

Let X, denote the number of type 1 genes in the mth generation, and assume 

that Xo =i. 

(a) Find E[X,,]. 

(b) What is the probability that eventually all the genes will be type 1? 

45. Consider an irreducible finite Markov chain with states 0,1,...,N. 


(a) Starting in state i, what is the probability the process will ever visit state 7? 
Explain! 

(b) Let x; = P{visit state N before state 0|start in i}. Compute a set of linear equa- 
tions that the x; satisfy, 7 = 0,1,...,N. 

(c) If iP = 1 fori = 1,...,N — 1, show that x; = i/N is a solution to the 
equations in part (b). 
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46. 


*47, 


48. 


49. 


50. 


An individual possesses r umbrellas that he employs in going from his home to office, 
and vice versa. If he is at home (the office) at the beginning (end) of a day and it is 
raining, then he will take an umbrella with him to the office (home), provided there 
is one to be taken. If it is not raining, then he never takes an umbrella. Assume that, 
independent of the past, it rains at the beginning (end) of a day with probability p. 
(a) Define a Markov chain with r + 1 states, which will help us to determine the 
proportion of time that our man gets wet. (Note: He gets wet if it is raining, 
and all umbrellas are at his other location.) 
(b) Show that the limiting probabilities are given by 


4 , ifi=0 
rrq 
t= where q=1-p 
a SPS Ae 
r+q 


(c) What fraction of time does our man get wet? 
(d) When r = 3, what value of p maximizes the fraction of time he gets wet 


Let {X,,,2 > 0} denote an ergodic Markov chain with limiting probabilities z;. 
Define the process {Yy,,7 > 1} by Yn = (Xy_1, Xn). That is, Y, keeps track of the 
last two states of the original chain. Is {Y,,” > 1} a Markov chain? If so, determine 
its transition probabilities and find 


lim P{Yn = Gf} 

noo 
Consider a Markov chain in steady state. Say that a k length run of zeroes ends at 
time m if 


Xm—k-1 #9, Xm—k = Xk = +» = Xm-1 = 9, Xm FO 


Show that the probability of this event is mo (Po,o)*!(1 — Po)’, where zg is the 
limiting probability of state 0. 


Let P and P®) denote transition probability matrices for ergodic Markov chains 
having the same state space. Let 2! and x? denote the stationary (limiting) proba- 
bility vectors for the two chains. Consider a process defined as follows: 

(a) Xo = 1. A coin is then flipped and if it comes up heads, then the remain- 
ing states X,,... are obtained from the transition probability matrix P 
and if tails from the matrix P@). Is {X,,2 > 0} a Markov chain? If p = 
P{coin comes up heads}, what is limy—+o P(X, = i)? 

(b) Xo = 1. At each stage the coin is flipped and if it comes up heads, then the 
next state is chosen according to P and if tails comes up, then it is chosen 
according to P®), In this case do the successive states constitute a Markov 
chain? If so, determine the transition probabilities. Show by a counterexample 
that the limiting probabilities are not the same as in part (a). 


In Exercise 8, if today’s flip lands heads, what is the expected number of additional 
flips needed until the pattern t, t,h,t,h, t,t occurs? 
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52. 


53. 


54. 


55. 


56. 


Sis 


In Example 4.3, Gary is in a cheerful mood today. Find the expected number of 
days until he has been glum for three consecutive days. 


A taxi driver provides service in two zones of a city. Fares picked up in zone A will 
have destinations in zone A with probability 0.6 or in zone B with probability 0.4. 
Fares picked up in zone B will have destinations in zone A with probability 0.3 or 
in zone B with probability 0.7. The driver’s expected profit for a trip entirely in 
zone A is 6; for a trip entirely in zone B is 8; and for a trip that involves both zones 
is 12. Find the taxi driver’s average profit per trip. 


Find the average premium received per policyholder of the insurance company of 
Example 4.27 if 4 = 1/4 for one-third of its clients, and A = 1/2 for two-thirds of 
its clients. 


Consider the Ehrenfest urn model in which M molecules are distributed between 
two urns, and at each time point one of the molecules is chosen at random 
and is then removed from its urn and placed in the other one. Let X, denote 
the number of molecules in urn 1 after the mth switch and let uw, = E[X,]. 
Show that 

(a) Mngt =1+ 1 -2/M)pn. 

(b) Use (a) to prove that 


M M-—2\"” M 
n= + (AT) (z1x01- 5) 


Consider a population of individuals each of whom possesses two genes that can be 
either type A or type a. Suppose that in outward appearance type A is dominant and 
type a is recessive. (That is, an individual will have only the outward characteristics 
of the recessive gene if its pair is aa.) Suppose that the population has stabilized, 
and the percentages of individuals having respective gene pairs AA, aa, and Aa are 
p, q, and r. Call an individual dominant or recessive depending on the outward 
characteristics it exhibits. Let $,, denote the probability that an offspring of two 
dominant parents will be recessive; and let S19 denote the probability that the 
offspring of one dominant and one recessive parent will be recessive. Compute $14 
and S19 to show that Sy; = re (The quantities $49 and $4; are known in the 
genetics literature as Snyder’s ratios.) 


Suppose that on each play of the game a gambler either wins 1 with probability p 
or loses 1 with probability 1 — p. The gambler continues betting until she or he is 
either up 7 or down m. What is the probability that the gambler quits a winner? 


A particle moves among 7 + 1 vertices that are situated on a circle in the following 
manner. At each step it moves one step either in the clockwise direction with prob- 
ability p or the counterclockwise direction with probability g = 1 — p. Starting at 
a specified state, call it state 0, let T be the time of the first return to state 0. Find 
the probability that all states have been visited by time T. 


Hint: Condition on the initial transition and then use results from the gambler’s 
ruin problem. 
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58. In the gambler’s ruin problem of Section 4.5.1, suppose the gambler’s fortune is 
presently i, and suppose that we know that the gambler’s fortune will eventually 
reach N (before it goes to 0). Given this information, show that the probability he 
wins the next gamble is 


pl — (q/py't} 


ie. if a 
1 @ipy ge 
i+] . 
ae? ipa 


Hint: The probability we want is 


P{Xn41 =i + 1/X, =i, lim Xm =N} 
m—-> Oo 


— P{Xng1 =i + 1, lim Xm = NIXn = i} 
~ P{limm Xm = N|Xy =i} 


59. For the gambler’s ruin model of Section 4.5.1, let M; denote the mean number of 
games that must be played until the gambler either goes broke or reaches a fortune 
of N, given that he starts with i,i = 0,1,..., N. Show that M, satisfies 


Mo=Mn=0; Mj=1+ pMi41 +9Mi-1, 1=1,...,.N—-1 


60. Solve the equations given in Exercise 59 to obtain 


M; = i(N — i), if p = 5 


i N 1-(q/p) : 1 
= 5 f > 
q-p 4-p1-(q/p)X ee 


61. Suppose in the gambler’s ruin problem that the probability of winning a bet depends 
on the gambler’s present fortune. Specifically, suppose that a; is the probability that 
the gambler wins a bet when his or her fortune is i. Given that the gambler’s initial 
fortune is i, let P(i) denote the probability that the gambler’s fortune reaches N 
before 0. 

(a) Derive a formula that relates P(7) to P(i— 1) and PGi + 1). 

(b) Using the same approach as in the gambler’s ruin problem, solve the equation 
of part (a) for P(i). 

(c) Suppose that i balls are initially in urn 1 and N — i are in urn 2, and suppose 
that at each stage one of the N balls is randomly chosen, taken from whichever 
urn it is in, and placed in the other urn. Find the probability that the first urn 
becomes empty before the second. 

*62. Consider the particle from Exercise 57. What is the expected number of steps the 
particle takes to return to the starting position? What is the probability that all 
other positions are visited before the particle returns to its starting state? 
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63. For the Markov chain with states 1, 2, 3, 4 whose transition probability matrix P 
is as specified below find fj3 and s;3 for i = 1,2, 3. 

0.4 0.2 0.1 0.3 

P= 0.1 0.5 0.2 0.2 

~ 10.3 0.4 0.2 0.1 
0 0 O 1 | 

64. Consider a branching process having x < 1. Show that if Xo = 1, then the expected 
number of individuals that ever exist in this population is given by 1/(1 — 2). What 
if Xo =n? 

65. Ina branching process having Xo = 1 and w > 1, prove that zo is the smallest 
positive number satisfying Equation (4.20). 

Hint: Let z be any solution of 7 = pan z/P;. Show by mathematical induction 
that > P{X, = 0} for all 7, and let 7 > oo. In using the induction argue that 
Sed . 
P{X, = 0} = )(P{Xp-1 = 0})/P; 
j=0 

66. Fora branching process, calculate 79 when 
(a) Po=%,P2= 3. 

(b) Po = 3,P1=34,P2 = @. 
(c) Po = 4, Py = 45 P3 = ;- 

67. At all times, an urn contains N balls—some white balls and some black balls. At 
each stage, a coin having probability p,0 < p < 1, of landing heads is flipped. If 
heads appears, then a ball is chosen at random from the urn and is replaced by 
a white ball; if tails appears, then a ball is chosen from the urn and is replaced 
by a black ball. Let X,, denote the number of white balls in the urn after the 
nth stage. 

(a) Is {X,,2 > 0} a Markov chain? If so, explain why. 

(b) What are its classes? What are their periods? Are they transient or recurrent? 

(c) Compute the transition probabilities P;. 

(d) Let N = 2. Find the proportion of time in each state. 

(e) Based on your answer in part (d) and your intuition, guess the answer for the 
limiting probability in the general case. 

(f) Prove your guess in part (e) either by showing that Equation (4.7) is satisfied 
or by using the results of Example 4.35. 

(g) If p = 1, what is the expected time until there are only white balls in the urn 
if initially there are i white and N — i black? 

*68. (a) Show that the limiting probabilities of the reversed Markov chain are the same 


as for the forward chain by showing that they satisfy the equations 
iSO; 
i 


(b) Give an intuitive explanation for the result of part (a). 
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73. 
74. 


M balls are initially distributed among m urns. At each stage one of the balls is 
selected at random, taken from whichever urn it is in, and then placed, at random, 
in one of the other M — 1 urns. Consider the Markov chain whose state at any time 
is the vector (71,..., 7m) where 1; denotes the number of balls in urn 7. Guess at the 
limiting probabilities for this Markov chain and then verify your guess and show 
at the same time that the Markov chain is time reversible. 


A total of m white and m black balls are distributed among two urns, with each urn 

containing m balls. At each stage, a ball is randomly selected from each urn and 

the two selected balls are interchanged. Let X, denote the number of black balls in 

urn 1 after the mth interchange. 

(a) Give the transition probabilities of the Markov chain Xy,,” > 0. 

(b) Without any computations, what do you think are the limiting probabilities 
of this chain? 

(c) Find the limiting probabilities and show that the stationary chain is time 
reversible. 


It follows from Theorem 4.2 that for a time reversible Markov chain 


Pi PipPpi = Pip PP iis for all 1, 1s k 
It turns out that if the state space is finite and Pj; > 0 for all i, 7, then the preceding 
is also a sufficient condition for time reversibility. (That is, in this case, we need 
only check Equation (4.26) for paths from i to i that have only two intermediate 
states.) Prove this. 


Hint: Fix i and show that the equations 
TP iz = TPR; 


are satisfied by 2; = cP;/Pj;, where c is chosen so that Py m= 1. 


For a time reversible Markov chain, argue that the rate at which transitions from i 
to j to k occur must equal the rate at which transitions from k to j to i occur. 


Show that the Markov chain of Exercise 31 is time reversible. 


A group of 7 processors is arranged in an ordered list. When a job arrives, the first 
processor in line attempts it; if it is unsuccessful, then the next in line tries it; if it too 
is unsuccessful, then the next in line tries it, and so on. When the job is successfully 
processed or after all processors have been unsuccessful, the job leaves the system. 
At this point we are allowed to reorder the processors, and a new job appears. 
Suppose that we use the one-closer reordering rule, which moves the processor that 
was successful one closer to the front of the line by interchanging its position with 
the one in front of it. If all processors were unsuccessful (or if the processor in the 
first position was successful), then the ordering remains the same. Suppose that each 
time processor i attempts a job then, independently of anything else, it is successful 
with probability p;. 

(a) Define an appropriate Markov chain to analyze this model. 

(b) Show that this Markov chain is time reversible. 

(c) Find the long-run probabilities. 
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77. 


A Markov chain is said to be a tree process if 

(i) Pj > 0 whenever P;; > 0, 

(ii) for every pair of states i and j,i 4 j, there is a unique sequence of distinct 
states i = i9,i1,..-5%n—1,in =j such that 


P >0, k=0,1,...,n-1 


Te sth+1 

In other words, a Markov chain is a tree process if for every pair of distinct states 
i and j there is a unique way for the process to go from i to j without reentering 
a state (and this path is the reverse of the unique path from ; to i). Argue that an 
ergodic tree process is time reversible. 


On a chessboard compute the expected number of plays it takes a knight, starting 
in one of the four corners of the chessboard, to return to its initial position if we 
assume that at each play it is equally likely to choose any of its legal moves. (No 
other pieces are on the board.) 


Hint: Make use of Example 4.36. 


In a Markov decision problem, another criterion often used, different than the 
expected average return per unit time, is that of the expected discounted return. In 
this criterion we choose a number a, 0 < a < 1, and try to choose a policy so as to 
maximize ELS ya! R(Xi, aj)] (that is, rewards at time 7 are discounted at rate a’). 
Suppose that the initial state is chosen according to the probabilities b;. That is, 


P{Xp =i} =b;, i=1,...,n 


For a given policy B let yjq denote the expected discounted time that the process 
is in state j and action a is chosen. That is, 


CO 
Via = Ep| >) 0" lx, =jan=a) 
n=0 


where for any event A the indicator variable I, is defined by 


1, if A occurs 
IA = 
0, otherwise 


(a) Show that 
CO 
dye = E| > ol exnan 
a n=0 


or, in other words, >”, yjq is the expected discounted time in state j under B. 
(b) Show that 


1 
Yo ya = bj +07 >> viaPi(@) 
a ia 
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Hint: For the second equation, use the identity 
TX p1=/) == Xu LX paistnaa) (Xn =H) 
Take expectations of the preceding to obtain 
El Ix, a7 mI T(X,-i,dnaa) |Pij (@) 
(c) Let {yjqa} be a set of numbers satisfying 
Y= 
- J l-a 
j a 
Yija = bj +e > yiaPy@ (4.38) 
a ia 


Argue that yjq can be interpreted as the expected discounted time that the 
process is in state j and action a is chosen when the initial state is chosen 
according to the probabilities b; and the policy B, given by 


Via 
ea Via 


Bi(a) = 


is employed. 
Hint: Derive a set of equations for the expected discounted times when policy B 
is used and show that they are equivalent to Equation (4.38). 


(d) Argue that an optimal policy with respect to the expected discounted return 
criterion can be obtained by first solving the linear program 


maximize a be YjaR GQ, 4), 
j a 

such that > > Via = — 
j a 


(04 
Yo via = 8) +0) Yo viaPy(a), 
a i a 
Via > 0, all j, a; 


and then defining the policy 6* by 


yr 
F Via 


where the y;,, are the solutions of the linear program. 


BF (a) = 


78. For the Markov chain of Exercise 5, suppose that p(s|/) is the probability that signal 
s is emitted when the underlying Markov chain state is j, j = 0,1, 2. 
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(a) What proportion of emissions are signal s? 
(b) What proportion of those times in which signal s is emitted is 0 the underlying 


state? 
79. In Example 4.43, what is the probability that the first 4 items produced are all 
acceptable? 
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The Exponential 
Distribution and the 
Poisson Process 
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5.1 Introduction 


In making a mathematical model for a real-world phenomenon it is always 
necessary to make certain simplifying assumptions so as to render the mathemat- 
ics tractable. On the other hand, however, we cannot make too many simplifying 
assumptions, for then our conclusions, obtained from the mathematical model, 
would not be applicable to the real-world situation. Thus, in short, we must 
make enough simplifying assumptions to enable us to handle the mathematics 
but not so many that the mathematical model no longer resembles the real-world 
phenomenon. One simplifying assumption that is often made is to assume that 
certain random variables are exponentially distributed. The reason for this is that 
the exponential distribution is both relatively easy to work with and is often a 
good approximation to the actual distribution. 

The property of the exponential distribution that makes it easy to analyze is 
that it does not deteriorate with time. By this we mean that if the lifetime of an 
item is exponentially distributed, then an item that has been in use for ten (or any 
number of) hours is as good as a new item in regards to the amount of time 
remaining until the item fails. This will be formally defined in Section 5.2 where 
it will be shown that the exponential is the only distribution that possesses this 
property. 

In Section 5.3 we shall study counting processes with an emphasis on a kind 
of counting process known as the Poisson process. Among other things we 
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shall discover about this process is its intimate connection with the exponential 
distribution. 


5.2 The Exponential Distribution 


5.2.1 Definition 


A continuous random variable X is said to have an exponential distribution with 
parameter A, A > 0, if its probability density function is given by 


ce, x>0 
f(x) = i x <0 


or, equivalently, if its cdf is given by 


x>0 
x <0 


x _ 9—Ax 
Foy =f fordy= {9° 


The mean of the exponential distribution, E[X], is given by 


[o@) 
af axe ** dx 
0 


Integrating by parts (u = x, dv = 4e~** dx) yields 


E[X] = [: xf (x) dx 


00 es 1 
E[X] = —xe*|) + / e** dx = — 
0 Xr 
The moment generating function ¢(f) of the exponential distribution is given by 


p(t) = Efe] 


lee) 
= eye""* dx 
0 


Xr 
= fort <A (5.1) 
rA-t 
All the moments of X can now be obtained by differentiating Equation (5.1). For 
example, 


dz 
EIX*] = 3590 


t=0 

Rk 

~ (A—2)3 
2, 


~ 42 


t=0 
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Consequently, 


Var(X) = E[X2] — (E[X])? 


2° 4 
~ 2 2 
1 


Example 5.1 (Exponential Random Variables and Expected Discounted Returns) 
Suppose that you are receiving rewards at randomly changing rates continuously 
throughout time. Let R(x) denote the random rate at which you are receiving 
rewards at time x. For a value w > 0, called the discount rate, the quantity 


R= [ e R(x) dx 
0 


represents the total discounted reward. (In certain applications, w is a continu- 
ously compounded interest rate, and R is the present value of the infinite flow of 
rewards.) Whereas 


FIR] = | / e* R(x) ds | = i, e-** EIR(x)] dx 
0 0 


is the expected total discounted reward, we will show that it is also equal to the 
expected total reward earned up to an exponentially distributed random time 
with rate a. 

Let T be an exponential random variable with rate a that is independent of all 
the random variables R(x). We want to argue that 


le) T 
/ e *EIR(x)] dx = el f R(x) as 
0 0 


To show this define for each x > 0 a random variable I(x) by 


1, ifx <T 
i ag (5 ifx > T 


and note that 


Ee ee) 
/ R(x) dx = / R(x)I(x) dx 
0 0 
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Thus, 


ke lee) 
p| f R(x) ds = el f R(x)I(x) as 
0 0 


= [ * EIR@O IC) | dx 

= [ " EIRG@)EU@)de by independence 
= is E[R(x)|P{T > x} dx 

= [ * e-ELR(x)] de 


Therefore, the expected total discounted reward is equal to the expected total 
(undiscounted) reward earned by a random time that is exponentially distributed 
with a rate equal to the discount factor. a 


5.2.2 Properties of the Exponential Distribution 


A random variable X is said to be without memory, or memoryless, if 
P{X >s+t|X>t}=P{X > s} for all s,t > 0 (5.2) 


If we think of X as being the lifetime of some instrument, then Equation (5.2) 
states that the probability that the instrument lives for at least s + t hours given 
that it has survived ¢ hours is the same as the initial probability that it lives for 
at least s hours. In other words, if the instrument is alive at time t, then the 
distribution of the remaining amount of time that it survives is the same as the 
original lifetime distribution; that is, the instrument does not remember that it 
has already been in use for a time ft. 
The condition in Equation (5.2) is equivalent to 


P{X >s+t, X >t} 
P{X > t} 


= P{X > s} 
or 

P{X >s+t}=P{X > s}P(X >} (5.3) 
Since Equation (5.3) is satisfied when X is exponentially distributed (for 


est) — e~Se~), it follows that exponentially distributed random variables 
are memoryless. 
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Example 5.2 Suppose that the amount of time one spends in a bank is 
exponentially distributed with mean ten minutes, that is, A = in What is the 
probability that a customer will spend more than fifteen minutes in the bank? 
What is the probability that a customer will spend more than fifteen minutes in 
the bank given that she is still in the bank after ten minutes? 


Solution: If X represents the amount of time that the customer spends in the 
bank, then the first probability is just 


P{X > 15} =e7!* = 73/2 = 0.220 


The second question asks for the probability that a customer who has spent ten 
minutes in the bank will have to spend at least five more minutes. However, 
since the exponential distribution does not “remember” that the customer has 
already spent ten minutes in the bank, this must equal the probability that an 
entering customer spends at least five minutes in the bank. That is, the desired 
probability is just 


PIX = Spee" Se!" 40.604 r 


Example 5.3 Consider a post office that is run by two clerks. Suppose that when 
Mr. Smith enters the system he discovers that Mr. Jones is being served by one of 
the clerks and Mr. Brown by the other. Suppose also that Mr. Smith is told that 
his service will begin as soon as either Jones or Brown leaves. If the amount of 
time that a clerk spends with a customer is exponentially distributed with mean 
1/4, what is the probability that, of the three customers, Mr. Smith is the last to 
leave the post office? 


Solution: The answer is obtained by this reasoning: Consider the time at which 
Mr. Smith first finds a free clerk. At this point either Mr. Jones or Mr. Brown 
would have just left and the other one would still be in service. However, by 
the lack of memory of the exponential, it follows that the amount of time that 
this other man (either Jones or Brown) would still have to spend in the post 
office is exponentially distributed with mean 1/A. That is, it is the same as if he 
were just starting his service at this point. Hence, by symmetry, the probability 
that he finishes before Smith must equal i. a 


Example 5.4 The dollar amount of damage involved in an automobile accident 
is an exponential random variable with mean 1000. Of this, the insurance com- 
pany only pays that amount exceeding (the deductible amount of) 400. Find the 
expected value and the standard deviation of the amount the insurance company 
pays per accident. 


Solution: If X is the dollar amount of damage resulting from an accident, 
then the amount paid by the insurance company is (X — 400)*, (where a™ is 
defined to equal aif a > 0 and to equal 0 if a < 0). Whereas we could certainly 
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determine the expected value and variance of (X —400)t from first principles, 
it is easier to condition on whether X exceeds 400. So, let 


jafi if X > 400 
“10, if X < 400 


Let Y =(X —400)t be the amount paid. By the lack of memory property of the 
exponential, it follows that if a damage amount exceeds 400, then the amount 
by which it exceeds it is exponential with mean 1000. Therefore, 


E[Y|I = 1] = 1000 
E[Y|I =0] =0 
Var(Y|I = 1) = (1000)? 
Var(Y|I = 0) =0 
which can be conveniently written as 


E[Y|I] = 10°1, Var(Y|I) = 10° 


Because I is a Bernoulli random variable that is equal to 1 with probability 
e°-4 we obtain 


E[Y] = E[E[Y|I]] = 10°E[I] = 10%e~°* ~ 670.32 
and, by the conditional variance formula 
Var(Y) = E[Var(Y|D] + Var(ELY|I]) 


= 10%e-94 4. 10%6-94(4 — 0-4) 


where the final equality used that the variance of a Bernoulli random variable 
with parameter p is p(1 — p). Consequently, 


V Var(Y) © 944.09 a 


It turns out that not only is the exponential distribution “memoryless,” but it 
is the unique distribution possessing this property. To see this, suppose that X is 
memoryless and let F(x) = P{X > x}. Then by Equation (5.3) it follows that 


F(s + t) = F(s)F@ 
That is, F(x) satisfies the functional equation 


g(s + t) = g(s)g(t) 
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However, it turns out that the only right continuous solution of this functional 
equation is 


—Ax* 


g(x) =e 
and since a distribution function is always right continuous we must have 
F(x) =e** 
or 
F(x) = P{X <x} =1—e* 


which shows that X is exponentially distributed. 


Example 5.5 A store must decide how much of a certain commodity to order 
so as to meet next month’s demand, where that demand is assumed to have an 
exponential distribution with rate A. If the commodity costs the store c per pound, 
and can be sold at a price of s > c per pound, how much should be ordered so as 
to maximize the store’s expected profit? Assume that any inventory left over at 
the end of the month is worthless and that there is no penalty if the store cannot 
meet all the demand. 


Solution: Let X equal the demand. If the store orders the amount t, then the 
profit, call it P, is given by 


P=smin(X,t) —ct 
Writing 


min(X,t) = X —(X-—#T 
* This is proven as follows: If g(s + t) = g(s)g(t), then 


(Z)=e(5+2)-°G) 


and repeating this yields g(m/n) = g’(1/n). Also, 


1 1 1 1 1 
s=e(, geet =)=#'(5) or (|) = ean 


Hence g(m/n) = (g(1))”/", which implies, since g is right continuous, that g(x) = (g(1))*. Since 
gA)y= (g(5))? > 0 we obtain g(x) = e~**”, where 4 = —log(g(1)). 


298 The Exponential Distribution and the Poisson Process 


we obtain, upon conditioning whether X > t and then using the lack of memory 
property of the exponential, that 
E((X —#)*] = EX —p" |X > HP(X > H+ E[(X—b' |X < PX <p 
= E[(X —t)*|X > tle” 


1 
= ~e-i 
Xr 
where the final equality used the lack of memory property of exponential ran- 
dom variables to conclude that, conditional on X exceeding t, the amount by 


which it exceeds it is an exponential random variable with rate 4. Hence, 


1 1 
E[min(X, t)] = — — -e-* 
[min(X, t)] re 1° 


giving that 


sos 
E[P] =—--e*“-—a 
Xn 2X 
Differentiation now yields that the maximal profit is attained when se~*’ —c = 


0; that is, when 


t= 5 og(s/c) 


Now, suppose that all unsold inventory can be returned for the amount r < 


min(s, c) per pound; and also that there is a penalty cost p per pound of unmet 
demand. In this case, using our previously derived expression for E[P], we have 


E[P| = = — Se — ot + Ble — X)*] — pE[K -2)*] 
Using that 

min(X,t) =t—(t— X)T 
we see that 


El(t — X)*] =~ Flmin(X,1)] =~ > + Se 


Hence, 
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Calculus now yields that the optimal amount to order is 


1 = 
Xr c—?r 


It is worth noting that the optimal amount to order increases in s, p, and r and 
decreases in A and c. (Are these monotonicity properties intuitive?) a 


The memoryless property is further illustrated by the failure rate function (also 
called the hazard rate function) of the exponential distribution. 

Consider a continuous positive random variable X having distribution function 
F and density f. The failure (or hazard) rate function r(t) is defined by 


f(t) 
1 — F(t) 
To interpret r(t), suppose that an item, having lifetime X, has survived for t hours, 


and we desire the probability that it does not survive for an additional time dz. 
That is, consider P{X € (t,t + dt)|X > t}. Now, 


r(t) = 


(5.4) 


P{X € (t,t + dt), X > t} 
Pix > 
P{X € (t,t + dt)} 
~ PIX >t 
~ fod 
T= Fa) 
That is, r(¢) represents the conditional probability density that a t-year-old item 
will fail. 

Suppose now that the lifetime distribution is exponential. Then, by the memo- 
ryless property, it follows that the distribution of remaining life for a t-year-old 
item is the same as for a new item. Hence, r(t) should be constant. This checks 
out since 


P{X €(t,t+ dt)|X >t}= 


= r(t) dt 


f(t) 
1 — F(t) 
het 


ew At 


r(t) = 


Thus, the failure rate function for the exponential distribution is constant. The 
parameter A is often referred to as the rate of the distribution. (Note that the rate 
is the reciprocal of the mean, and vice versa.) 

It turns out that the failure rate function r(t) uniquely determines the distribu- 
tion F. To prove this, we note by Equation (5.4) that 


4 F(t) 
= 2d 
"= To 
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Integrating both sides yields 
t 
log(1 — F(t) = -| r(t) dt +k 
0 


or 


t 
1— F(t) =e exp f r(t) ar} 
0 


Letting t = 0 shows that k = 0 and thus 


t 
Ft) =1 -exp|- | r(t) ar} 
0 


The preceding identity can also be used to show that exponential random 
variables are the only ones that are memoryless. Because if X is memoryless, then 
its failure rate function must be constant. But if r(t) = c, then by the preceding 
equation 


1-F(it)=e 


showing that the random variable is exponential. 


Example 5.6 Let Xj,..., X;, be independent exponential random variables with 
respective rates 41,...,4, where 4; # 4; when i # j. Let T be independent of 
these random variables and suppose that 


SoP;=1 where P) = P(T =}} 
j=l 


The random variable X7 is said to be a hyperexponential random variable. To 
see how such a random variable might originate, imagine that a bin contains 
different types of batteries, with a type j battery lasting for an exponential dis- 
tributed time with rate Aj,j = 1,...,7. Suppose further that P; is the proportion 
of batteries in the bin that are type j for each j = 1,...,7. If a battery is ran- 
domly chosen, in the sense that it is equally likely to be any of the batteries in 
the bin, then the lifetime of the battery selected will have the hyperexponential 
distribution specified in the preceding. 

To obtain the distribution function F of X = X7, condition on T. This yields 


1—F(t)=P{X > 


= y> PIX > tT =aP{T =3 
i=1 


n 
i=1 
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Differentiation of the preceding yields f, the density function of X. 


foO= ee RPie 


i=1 
Consequently, the failure rate function of a hyperexponential random variable is 


n pT At 
jai PjAje j 


fe ae 
r(t) 5 Pet 


By noting that 
P{X > t|T = j}P{(T = 7} 
P(X >H 
Pre 7s" 
> Dope bie et 
we see that the failure rate function r(t) can also be written as 


PIT ={|\X >= 


r= P(E 7X S49} 
j=1 


If Ay < Aj, for alli > 1, then 


Pye 
pMes ela Pie ME ag Pet 
Py 
~ Py + hg Pie Fit 
>1 as t > Oo 


Similarly, P{T = i|X > t} > 0 when i # 1, thus showing that 


lim r(¢t) = min A; 
t> co 1 


That is, as a randomly chosen battery ages its failure rate converges to the failure 
rate of the exponential type having the smallest failure rate, which is intuitive 
since the longer the battery lasts, the more likely it is a battery type with the 
smallest failure rate. a 


5.2.3 Further Properties of the Exponential Distribution 


Let X1,...,X, be independent and identically distributed exponential random 
variables having mean 1/4. It follows from the results of Example 2.39 that 
X1 +--+ +X, hasa gamma distribution with parameters 1 and A. Let us now give 
a second verification of this result by using mathematical induction. Because there 
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is nothing to prove when n = 1, let us start by assuming that X; + --- + Xy-1 
has density given by 
gay 

(n— 2)! 


Pepe (t) =e 
Hence, 


Px y4 4 Xp 1+ Xn (6) = [ fx, (t ie Satin) ds 


t As) 
= xr —A(t—s) 9. —As ( 
I ‘ EGON 
ke * ann 
= (n—1)! 


which proves the result. 

Another useful calculation is to determine the probability that one exponential 
random variable is smaller than another. That is, suppose that X; and X2 are 
independent exponential random variables with respective means 1/A1 and 1/A; 
what is P{X, < X2}? This probability is easily calculated by conditioning on X: 


[oe 
P{X1 < Xz} = / P{X1 < X2|X1 = x}aze ** dx 
0 
Co 
= / P{x < Sine dx 
0 
(oe) 
= / e 2%) eM dx 
0 


[o,@) 
= pen Atha dy 
0 


 Ayb Ag. 
Suppose that X1, X2,..., X, are independent exponential random variables, with 


X; having rate j,i = 1,...,”. It turns out that the smallest of the X; is exponen- 
tial with a rate equal to the sum of the 1;. This is shown as follows: 


(5.5) 


P{minimum(X,,...,X;,) > x} = P{X; > x for eachi = 1,...,n} 


= [ [Pe > x} (by independence) 
i=1 


n 
i=1 


= exp {- (s») | (5.6 
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Example 5.7 (Analyzing Greedy Algorithms for the Assignment Problem) 
A group of 2 people is to be assigned to a set of 7 jobs, with one person assigned to 
each job. For a given set of n* values Cj, i,j = 1,...,”, a cost Cj is incurred when 
person i is assigned to job j. The classical assignment problem is to determine the 
set of assignments that minimizes the sum of the 7 costs incurred. 

Rather than trying to determine the optimal assignment, let us consider two 
heuristic algorithms for solving this problem. The first heuristic is as follows. 
Assign person 1 to the job that results in the least cost. That is, person 1 is 
assigned to job j; where C(1,/;) = minimum; C(1,/). Now eliminate that job 
from consideration and assign person 2 to the job that results in the least cost. 
That is, person 2 is assigned to job j2 where C(2, j2) = minimum,z;, C(2, /). This 
procedure is then continued until all 7 persons are assigned. Since this procedure 
always selects the best job for the person under consideration, we will call it 
Greedy Algorithm A. 

The second algorithm, which we call Greedy Algorithm B, is a more “global” 
version of the first greedy algorithm. It considers all n* cost values and chooses 
the pair i,j, for which C(i,/) is minimal. It then assigns person i; to job /,. It 
then eliminates all cost values involving either person i; or job 1 (so that (7 — 1)? 
values remain) and continues in the same fashion. That is, at each stage it chooses 
the person and job that have the smallest cost among all the unassigned people 
and jobs. 

Under the assumption that the C; constitute a set of n” independent exponential 
random variables each having mean 1, which of the two algorithms results in a 
smaller expected total cost? 


Solution: Suppose first that Greedy Algorithm A is employed. Let C; denote 
the cost associated with person i,i = 1,...,”. Now C, is the minimum of n 
independent exponentials each having rate 1; so by Equation (5.6) it will be 
exponential with rate 7. Similarly, C2 is the minimum of 1 — 1 independent 
exponentials with rate 1, and so is exponential with rate 7 — 1. Indeed, by the 
same reasoning C; will be exponential with raten —i+ 1, i=1,...,”. Thus, 
the expected total cost under Greedy Algorithm A is 


Ea[total cost] = E[C, + --- + Cy] 


n 


= 1G 
i=1 


Let us now analyze Greedy Algorithm B. Let C; be the cost of the ith person- 
job pair assigned by this algorithm. Since C; is the minimum of all the n* 
values Cj, it follows from Equation (5.6) that C; is exponential with rate n. 
Now, it follows from the lack of memory property of the exponential that the 
amounts by which the other Cj, exceed C, will be independent exponentials 
with rates 1. As a result, C2 is equal to C; plus the minimum of (n — 1) inde- 
pendent exponentials with rate 1. Similarly, C3 is equal to C2 plus the minimum 
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of (n — 2)* independent exponentials with rate 1, and so on. Therefore, we see 
that 


E[Cy] = 1/n?, 
E(Co] = E[Ci] + 1/™-— 1)’, 
E[C3] = E[C2] + 1/(m — 2)’, 


E[C] = E[Cj-1] + 1/(-j + 1)’, 


E(C,] = E[Cy-1] + 1 
Therefore, 
E[Ci] = 1/n’, 
E[Cy] = 1/n* + 1/(n—- 1)’, 
E[C3] = 1/n? + 1/(n— 1)? + 1/(n — 2), 


E[Cy] = 1/n? +1/(n—-1)* + 1/(@—2)? + --- +1 
Adding up all the E[C,] yields 
Ep[total cost] = n/n? + (n—1)/(n— 1)* + (n—2)/m—2)* + --- +1 


n 
1 
a 
i=1 


The expected cost is thus the same for both greedy algorithms. a 


Let X1,..., X, be independent exponential random variables, with respective 
rates Ay,...,An. A useful result, generalizing Equation (5.5), is that X; is the 
smallest of these with probability 4;/ }°; Aj. This is shown as follows: 


P{X; = min x;| - P{X; < min X;} 
J j#l 


Yat Aj 


where the final equality uses Equation (5.5) along with the fact that minj4;X; is 
exponential with rate }?..; Aj. 

Another important fact is that min; X; and the rank ordering of the X; are 
independent. To see why this is true, consider the conditional probability that 
Xj, < Xj, < +++ < Xj, given that the minimal value is greater than t. Because 
min; X; >t means that all the X; are greater than t, it follows from the lack 
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of memory property of exponential random variables that their remaining lives 
beyond t remain independent exponential random variables with their original 
rates. Consequently, 

a 


PIX < +++ < X;,|minX; > t| = P{x —1<-+<X;,—2]minX; > ¢| 
1 1 


= P{Xi, <cee< Xj, 
which proves the result. 


Example 5.8 Suppose you arrive at a post office having two clerks at a moment 
when both are busy but there is no one else waiting in line. You will enter service 
when either clerk becomes free. If service times for clerk i are exponential with 
rate 4;,i = 1,2, find E[T], where T is the amount of time that you spend in the 
post office. 


Solution: Let R; denote the remaining service time of the customer with clerk i, 
i = 1,2, and note, by the lack of memory property of exponentials, that Ry; 
and R» are independent exponential random variables with respective rates A1 
and 42. Conditioning on which of Ry or R2 is the smallest yields 


E[T] = E[T|Ry < R2)P{Ry < Ro} + E[T|R2 < Ri|P{R2 < Ri} 


x 
1 __ 4 E[T|Rz < Ri] 
2 


= E[T|R R>»|——— 
[T|Ri < rer) 


as 
Aq + A2 
Now, with S$ denoting your service time 


E[T|Ry < Ro] = E[R; + S|Ry < Ro] 
= E[R,1|R1 < Ro] + E[S|Ry <R)] 
1 
= E[R1|R1 < R2] + — 
AY 


Be 1 Se 1 
Ay + Ad M 


The final equation used that conditional on R1 < Rz the random variable Ry, 
is the minimum of R, and R2 and is thus exponential with rate 41 + 42; and 
also that conditional on R; < R2 you are served by server 1. 

As we can show ina similar fashion that 


_ 1 “f 1 
Ay Ad 2 


E[T|R2 < Ri] 


we obtain the result 


_ 3 
Ay t Ad 
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Another way to obtain E[T] is to write T as a sum, take expectations, and 
then condition where needed. This approach yields 
E[T] = E[min(R, R2) + S] 
= E[min(R1, R2)] + E[S] 
1 
ha tg 


+ E[S] 


To compute E[S], we condition on which of R; and R2 is smallest. 


M A2 
E[S] = E[S|R1, < Ro] ———— + E[S|R2 < Ry] —— 
[S] [S| Ry sarees [S| Ro UT earer es 
2 
Set = | 
Ay + A2 


Example 5.9 There are 1 cells in the body, of which cells 1,..., & are target cells. 
Associated with each cell is a weight, with w; being the weight associated with 
celli,i=1,...,”. The cells are destroyed one at a time in a random order, which 
is such that if S is the current set of surviving cells then, independent of the order 
in which the cells not in S have been destroyed, the next cell killed is i, i € S, with 
probability w;/>)j-s wj. In other words, the probability that a given surviving 
cell is the next one to be killed is the weight of that cell divided by the sum of the 
weights of all still surviving cells. Let A denote the total number of cells that are 
still alive at the moment when all the cells 1,2,...,& have been killed, and find 
E[{A]. 


Solution: Although it would be quite difficult to solve this problem by a direct 
combinatorial argument, a nice solution can be obtained by relating the order 
in which cells are killed to a ranking of independent exponential random vari- 
ables. To do so, let X1,..., X;, be independent exponential random variables, 
with X; having rate w;,i = 1,...,”. Note that X; will be the smallest of these 
exponentials with probability w;/}°;w;; further, given that X; is the small- 
est, X, will be the next smallest with probability w,/ isi Mi further, given 
that X; and X;, are, respectively, the first and second smallest, X;, s 4 i,r, 
will be the third smallest with probability w;/>7j,;,j;; and so on. Conse- 
quently, if we let I; be the index of the jth smallest of X1,...,X,—so that 
X1, < X1, < ++: < Xj, —then the order in which the cells are destroyed has 
the same distribution as y,...,I,. So, let us suppose that the order in which 
the cells are killed is determined by the ordering of X1,..., X,. (Equivalently, 
we can suppose that all cells will eventually be killed, with cell i being killed at 
time Xj,i=1,...,7.) 

If we let A; equal 1 if cell j is still alive at the moment when all the cells 
1,...,& have been killed, and let it equal 0 otherwise, then 


n 
A= >> Aj 


j=k+1 
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Because cell j will be alive at the moment when all the cells 1,...,& have been 
killed if X; is larger than all the values X1,...,X,, we see that for j > k 


E[Aj] = P{A; = 1} 
= P{X; > Dee Xj} 
t= 1s 


[o,@) 
=f PX; > max, X;|Xj = x] wje"* de 
0 


i=1,..., 


[o,@) 
= / P{X; <x foralli=1,...,k}wje“* dx 
0 


oo (R 
= / [[a —e **) wie * dx 
0 j=t 


1 k 
=f TIa-s a 
0 j=1 


where the final equality follows from the substitution y = e~“/*. Thus, we 
obtain the result 


n 1k 1 1 k 
BAl= > f []a-yydy= f° T[a-y dy 


j=kt+1 i=1 j=k+1 i=1 


Example 5.10 Suppose that customers are in line to receive service that is pro- 
vided sequentially by a server; whenever a service is completed, the next person 
in line enters the service facility. However, each waiting customer will only wait 
an exponentially distributed time with rate 0; if its service has not yet begun by 
this time then it will immediately depart the system. These exponential times, 
one for each waiting customer, are independent. In addition, the service times are 
independent exponential random variables with rate 1. Suppose that someone is 
presently being served and consider the person who is mth in line. 


(a) Find P,,, the probability that this customer is eventually served. 
(b) Find W,,, the conditional expected amount of time this person spends waiting in line 
given that she is eventually served. 


Solution: Consider the 2 + 1 random variables consisting of the remaining 
service time of the person in service along with the n additional exponential 
departure times with rate @ of the first 7 in line. 

(a) Given that the smallest of these 2 + 1 independent exponentials is the 
departure time of the mth person in line, the conditional probability that this 
person will be served is 0; on the other hand, given that this person’s depar- 
ture time is not the smallest, the conditional probability that this person will 
be served is the same as if it were initially in position m — 1. Since the proba- 
bility that a given departure time is the smallest of the 7 + 1 exponentials is 
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0/(nO + 2), we obtain 


i (w-DO+H, 
i no + wb 


n n—1 


Using the preceding with n — 1 replacing n gives 


(n—1)0+p(n—2)04+ pw (n—2)0+ uw 
Pr= De 
no + 


Pye 
noth (n—-lOtpm oe 


Continuing in this fashion yields the result 


O+ bh LL 
= Py = 
nO + nO + fh 


n 


(b) To determine an expression for W,,, we use the fact that the minimum of 
independent exponentials is, independent of their rank ordering, exponential 
with a rate equal to the sum of the rates. Since the time until the mth person 
in line enters service is the minimum of these 2 + 1 random variables plus the 
additional time thereafter, we see, upon using the lack of memory property of 
exponential random variables, that 


Repeating the preceding argument with successively smaller values of 1 yields 
the solution 


n 
1 
W,= |_| 
a arr 


5.2.4 Convolutions of Exponential Random Variables 


Let X;,i = 1,...,7, be independent exponential random variables with respec- 
tive rates Aj,i = 1,...,, and suppose that A; ¢ A; fori 4 j. The random variable 
YL Xj is said to be a hypoexponential random variable. To compute its prob- 
ability density function, let us start with the case 7 = 2. Now, 


t 

fax = [ hte Oa 
t 
0 


t 
= hudge™ f e A1-A2)s de 
0 


Xr 
_ 1 mre EC — e A1-A2)ty 
Aq — A2 


Xr 
— . Age ?2# he 
Ay —A2 2 — Aq 


A2 _ 
Aye aut 
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Using the preceding, a similar computation yields, when 7 = 3, 


3 
: hj 
fx 4+%+x3;0) = dae (T] —_— ) 
i=1 ae 
which suggests the general result 
n 
aiuto = ee Gakie 


i=1 


where 


We will now prove the preceding formula by induction on n. Since we have 
already established it for 1 = 2, assume it for m and consider m + 1 arbitrary 
independent exponentials X; with distinct rates 4;,i = 1,...,2 + 1. If necessary, 
renumber X; and X,,41 so that 4,41 < A,. Now, 


t 
Px 4 4+X nat (t) =) Paavbe@rnae ete ds 
0 


- t 

= ) Gif Age jg e MH) ds 
. 0 
al. 


Z hi Rest 
— (OF (= ype fet? + tae) 
d, NG Ant Ant — Ai” 


n 
= Kyptdnpie Pl? + > Cisaaje (5.7) 
| 


where Ky41 = S3j-4 Cindi/(Ai — An41) is a constant that does not depend on t. 
But, we also have that 


t 
POA (t) -| cet are (s)Aye 1-9) ds 


which implies, by the same argument that resulted in Equation (5.7), that for a 
constant K, 


n+1 

=e =e 

feppoix, 4 @ = Kidse i) Ci np1hrie it 
i=2 
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Equating these two expressions for fx, +...4x,,,(£) yields 
ant hat —Ait a 
Kypiarngie 7 + Cyngiare “lM = Kyaye* + Capi ntidangie 7! 


Multiplying both sides of the preceding equation by e*+’ and then letting t > oo 
yields [since e~ “142+? 5 0 ast > oo] 


Ky = Chatn+1 


and this, using Equation (5.7), completes the induction proof. Thus, we have 
shown that if S = }77_, Xj, then 


fs(t) = S Gye (5.8) 
i=1 
where 
x; 
C; n= : 
> I] Aj —iAj 
j#l 


Integrating both sides of the expression for fs from ¢ to oo yields that the tail 
distribution function of S is given by 


P{S >t} = SS Cine (5.9) 
i=l 


Hence, we obtain from Equations (5.8) and (5.9) that rs(t), the failure rate func- 
tion of S, is as follows: 


n —i;t 
Der Caples © 
n oe: 
Dini Cine *# 


If we let A; = min(Aj,...,4n), then it follows, upon multiplying the numerator 
and denominator of rs(t) by e*’’, that 


rs(t) = 


lim rs(t) = Aj 
t>0oo 


From the preceding, we can conclude that the remaining lifetime of a hypoex- 
ponentially distributed item that has survived to age t is, for ¢ large, approxi- 
mately that of an exponentially distributed random variable with a rate equal 
to the minimum of the rates of the random variables whose sums make up the 
hypoexponential. 
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Remark Although 


fore) n iW Ne 
1= dt= C= u 
[pw Dow = Dy 


i=1 ji 


it should not be thought that the C;,,, i = 1,..., are probabilities, because some 
of them will be negative. Thus, while the form of the hypoexponential density 
is similar to that of the hyperexponential density (see Example 5.6) these two 
random variables are very different. 


Example 5.11 Let X1,..., Xj be independent exponential random variables 
with respective rates A1,...,A;m, where 4; A A; when i # j. Let N be independent 
of these random variables and suppose that }°7"_, P,, = 1, where P, = P{N = n}. 
The random variable 


N 
y¥=>0% 
j=l 


is said to be a Coxian random variable. Conditioning on N gives its density 
function: 


fy(t) = )\AvGIN = ”)Pp, 


n=1 


= Yo fx 4Xq (EIN =n)P, 


= oft 4% OPn 


Let 
r(n) = P{N=n|N =n} 


If we interpret N as a lifetime measured in discrete time periods, then r(7) denotes 
the probability that an item will die in its mth period of use given that it has 
survived up to that time. Thus, r() is the discrete time analog of the failure rate 
function r(f), and is correspondingly referred to as the discrete time failure (or 
hazard) rate function. 

Coxian random variables often arise in the following manner. Suppose that 
an item must go through m stages of treatment to be cured. However, suppose 
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that after each stage there is a probability that the item will quit the program. 
If we suppose that the amounts of time that it takes the item to pass through 
the successive stages are independent exponential random variables, and that 
the probability that an item that has just completed stage quits the program 
is (independent of how long it took to go through the 7 stages) equal to r(v), 
then the total time that an item spends in the program is a Coxian random 
variable. a 


5.3 The Poisson Process 


5.3.1 Counting Processes 


A stochastic process {N(t),t > 0} is said to be a counting process if N(t) rep- 
resents the total number of “events” that occur by time ¢. Some examples of 
counting processes are the following: 


(a) If we let N(¢) equal the number of persons who enter a particular store at or prior to 
time ft, then {N(¢),t > 0} is a counting process in which an event corresponds to a 
person entering the store. Note that if we had let N(t) equal the number of persons 
in the store at time f, then {N(t),¢ > 0} would ot be a counting process (why not?). 

(b) If we say that an event occurs whenever a child is born, then {N(¢),¢ > 0} is a 
counting process when N(f) equals the total number of people who were born by 
time t. (Does N(£) include persons who have died by time ¢? Explain why it must.) 

(c) If N(@) equals the number of goals that a given soccer player scores by time t, then 
{N(t),t > 0} is a counting process. An event of this process will occur whenever the 
soccer player scores a goal. 


From its definition we see that for a counting process N(¢) must satisfy: 


(i) N(t)>0. 

(ii) N(¢) is integer valued. 

(iii) Ifs < t, then N(s) < N(t). 

(iv) Fors < t, N(t) — N(s) equals the number of events that occur in the interval (s, f]. 


A counting process is said to possess independent increments if the numbers 
of events that occur in disjoint time intervals are independent. For example, this 
means that the number of events that occur by time 10 (that is, N(10)) must be 
independent of the number of events that occur between times 10 and 15 (that 
is, N15) — N(10)). 

The assumption of independent increments might be reasonable for exam- 
ple (a), but it probably would be unreasonable for example (b). The reason for 
this is that if in example (b) N(¢) is very large, then it is probable that there are 
many people alive at time ¢; this would lead us to believe that the number of 
new births between time t and time t + s would also tend to be large (that is, 
it does not seem reasonable that N(t) is independent of N(t + s) — N(t), and 
so {N(t),t > 0} would not have independent increments in example (b)). The 
assumption of independent increments in example (c) would be justified if we 
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believed that the soccer player’s chances of scoring a goal today do not depend 
on “how he’s been going.” It would not be justified if we believed in “hot streaks” 
or “slumps.” 

A counting process is said to possess stationary increments if the distribution 
of the number of events that occur in any interval of time depends only on the 
length of the time interval. In other words, the process has stationary incre- 
ments if the number of events in the interval (s,s + t) has the same distribution 
for all s. 

The assumption of stationary increments would only be reasonable in exam- 
ple (a) if there were no times of day at which people were more likely to enter 
the store. Thus, for instance, if there was a rush hour (say, between 12 P.M. and 
1 P.M.) each day, then the stationarity assumption would not be justified. If we 
believed that the earth’s population is basically constant (a belief not held at 
present by most scientists), then the assumption of stationary increments might 
be reasonable in example (b). Stationary increments do not seem to be a reason- 
able assumption in example (c) since, for one thing, most people would agree that 
the soccer player would probably score more goals while in the age bracket 25-30 
than he would while in the age bracket 35—40. It may, however, be reasonable 
over a smaller time horizon, such as one year. 


5.3.2 Definition of the Poisson Process 


One of the most important counting processes is the Poisson process, which is 
defined as follows: 


Definition 5.1 The counting process {N(t),t > 0} is said to be a Poisson process 
having rate r, 4 > 0, if 


(i) N(O) =0. 
(ii) The process has independent increments. 
(iii) The number of events in any interval of length ¢ is Poisson distributed with mean 
At. That is, for all s, t > 0 


At)” 
_ ue Ot) 


P{N(t + s) — N(s) =n} a n=0,1,... 


Note that it follows from condition (iii) that a Poisson process has stationary 
increments and also that 


E[N(@)] = At 


which explains why A is called the rate of the process. 

To determine if an arbitrary counting process is actually a Poisson process, 
we must show that conditions (i), (ii), and (iii) are satisfied. Condition (i), which 
simply states that the counting of events begins at time t = 0, and condition (ii) 
can usually be directly verified from our knowledge of the process. However, it 
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is not at all clear how we would determine that condition (iii) is satisfied, and for 
this reason an equivalent definition of a Poisson process would be useful. 

As a prelude to giving a second definition of a Poisson process we shall define 
the concept of a function f(-) being o(h). 


Definition 5.2 The function f(-) is said to be o(h) if 


lim f(h) =0 
h>0 h 


Example 5.12 


(a) The function f(x) = x? is o(A) since 
2 
lim £%) _ li es lim h = 0 
h>0 h h>0 h>0 


(b) The function f(x) = x is not o(h) since 


h 
lim —— = lim — = lim 1=140 
hoo h ho0h hoo 


(c) If f(-) is o(A) and g(-) is o(h), then so is f(-) + g(-). This follows since 


tim LOD BCD ty LOD 4 ig SOD 


0+0=0 
h->0 h h>0 h h>0 bh a 


(d) If f(-) is o(A), then so is g(-) = cf(-). This follows since 


lim of(h) =clim fh) — 
h>0 A = h ~ 


c-0=0 


(e) From (c) and (d) it follows that any finite linear combination of functions, each of 
which is o(h), is o(h). | 


In order for the function f(-) to be o(h) it is necessary that f(h)/h go to zero 
as h goes to zero. But if h goes to zero, the only way for f(h)/h to go to zero is 
for f(h) to go to zero faster than h does. That is, for small, f(2) must be small 
compared with h. 

We are now in a position to give an alternate definition of a Poisson process. 


Definition 5.3 The counting process {N(t), ft > 0} is said to be a Poisson process 
having rate A, A > 0, if 


) N(O)=0. 

) The process has stationary and independent increments. 
(iii) P{N(h) = 1} =ah + o(h). 

) P{N(h) 2 2} = off). 
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Theorem 5.1 Definitions 5.1 and 5.3 are equivalent. 


Proof. We show that Definition 5.3 implies Definition 5.1, and leave it to you to 
prove the reverse. To start, fix u > 0 and let 


g(t) = Elexp{—uN(4)}] 
We derive a differential equation for g(t) as follows: 


g(t + h) = Elexp{—uN(t + /)}] 
= Efexp{—uN(t)} exp{—u(N(¢ + A) — N())}) 
= Efexp{—uN(t)}]Elexp{—u(N(t + h) — N())}] 
by independent increments 
= g(t) E[exp{—uN(h)}] by stationary increments (5.10) 


Now, assumptions (iii) and (iv) imply that 
P{N(h) = 0} = 1—Ah + oth) 
Hence, conditioning on whether N(h) = 0 or N(h) = 1 or N(A) > 2 yields 


Elexp{—uN(h)}] = 1 — Ah + o(h) + eo“ (Ah + o(h)) + 0(h) 
=1—-ah+e“h + o(h) (5.11) 


Therefore, from Equations (5.10) and (5.11) we obtain 


g(t +h) = g(t)(1—Ah + e “Ah) + o(h) 


implying that 
x 7 80 g(t)a(e™" — 1) + a 


Letting h > 0 gives 
g(t) = g(Hace — 1) 
or, equivalently, 


g(t) 


=h(e™* — 1 
g(t) ¢ : 


Integrating, and using g(0) = 1, shows that 


log(g(¢)) = at(e™ — 1) 
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Figure 5.1 
or 


g(t) = exp{at(e * — 1)} 


That is, the Laplace transform of N(t) evaluated at u is e“@"—"), Since that is 
also the Laplace transform of a Poisson random variable with mean At, the result 
follows from the fact that the distribution of a nonnegative random variable is 
uniquely determined by its Laplace transform. a 


Remarks 


(i) 


The result that N(¢) has a Poisson distribution is a consequence of the Poisson 
approximation to the binomial distribution (see Section 2.2.4). To see this, subdivide 
the interval [0,¢] into k equal parts where k is very large (Figure 5.1). Now it can 
be shown using axiom (iv) of Definition 5.3 that as k increases to oo the probability 
of having two or more events in any of the k subintervals goes to 0. Hence, N(t) 
will (with a probability going to 1) just equal the number of subintervals in which 
an event occurs. However, by stationary and independent increments this number 
will have a binomial distribution with parameters k and p = At/k + o(t/k). Hence, 
by the Poisson approximation to the binomial we see by letting k approach oo that 
N(¢) will have a Poisson distribution with mean equal to 


. t t\] _ _ to(t/k) 
dim tag to(¢)]=ar+ im t/k 


=At 


by using the definition of o(A) and the fact that t/k > 0 ask > oo. 
The explicit assumption that the process has stationary increments can be eliminated 
from Definition 5.3 provided that we change assumptions (iii) and (iv) to require that 
for any t the probability of one event in the interval (t,t + 4) is Ah + o(h) and the 
probability of two or more events in that interval is o(f). That is, assumptions (ii), 
(iii), and (iv) of Definition 5.3 can be replaced by 

The process has independent increments. 

P{N(t +h) — N(t) = 1} =AA + oh). 

P{N(t +h) — N(t) > 2} = ofA). 


5.3.3 Interarrival and Waiting Time Distributions 


Consider a Poisson process, and let us denote the time of the first event by Ty. 
Further, for 2 > 1, let T,, denote the elapsed time between the (7 — 1)st and the 
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nth event. The sequence {T;,,” = 1,2,...} is called the sequence of interarrival 
times. For instance, if Ty = 5 and Tz = 10, then the first event of the Poisson 
process would have occurred at time 5 and the second at time 15. 

We shall now determine the distribution of the T,,. To do so, we first note that 
the event {T; > t} takes place if and only if no events of the Poisson process occur 
in the interval [0, ¢] and thus, 


P{T, > t}= P{IN@) =0} =e 

Hence, T; has an exponential distribution with mean 1/4. Now, 
P{Tz > t} = E[P{T2 > t|T1}] 

However, 


P{T, >t | Ty =s} = P{0 events in (s,s + t] | Ty =} 
= P{0 events in (s,s + t]} 
=e (5.12) 


where the last two equations followed from independent and stationary incre- 
ments. Therefore, from Equation (5.12) we conclude that T> is also an exponen- 
tial random variable with mean 1/A and, furthermore, that T) is independent of 
T,. Repeating the same argument yields the following. 


Proposition 5.1 T;,,2 = 1,2,..., are independent identically distributed expo- 
nential random variables having mean 1/4. 


Remark The proposition should not surprise us. The assumption of stationary 
and independent increments is basically equivalent to asserting that, at any point 
in time, the process probabilistically restarts itself. That is, the process from any 
point on is independent of all that has previously occurred (by independent incre- 
ments), and also has the same distribution as the original process (by stationary 
increments). In other words, the process has no memory, and hence exponential 
interarrival times are to be expected. 

Another quantity of interest is S,,, the arrival time of the mth event, also called 
the waiting time until the mth event. It is easily seen that 


n 
Sa= > Tp nol 
i=1 


and hence from Proposition 5.1 and the results of Section 2.2 it follows that 
S, has a gamma distribution with parameters 1 and A. That is, the probability 
density of S, is given by 

—yy Qty" 


fs, (t) = re @—D! t2>0 (5,13) 
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Equation (5.13) may also be derived by noting that the mth event will occur prior 
to or at time ¢ if and only if the number of events occurring by time ¢ is at least n. 
That is, 


N@>n S&S S,<t 


Hence, 
00 j 
Fs,(0) = PlSn <1} = PIN > m} = Ye 
j=n 
which, upon differentiation, yields 
Sa ar At A tI 
h=—y de +S ae 
fs,,(t) X F dX G—DI 
n—1 oO j-1 o° 
=hre* a 1D! ay ye hee — D! de on o 
ae Oe 
a 7 Fy 


Example 5.13 Suppose that people immigrate into a territory at a Poisson rate 
A = 1 per day. 
(a) What is the expected time until the tenth immigrant arrives? 


(b) What is the probability that the elapsed time between the tenth and the eleventh 
arrival exceeds two days? 


Solution: 
(a) E[Sjo] = 10/4 = 10 days. 
(b) P{Ty1 > 2} = e-* = e 2 © 0.133. | 


Proposition 5.1 also gives us another way of defining a Poisson process. Sup- 
pose we start with a sequence {T;,,, > 1} of independent identically distributed 
exponential random variables each having mean 1/4. Now let us define a count- 
ing process by saying that the mth event of this process occurs at time 

S,=1T1+T2.+---+T;, 


The resultant counting process {N(t), t > 0}* will be Poisson with rate A. 


* A formal definition of N(f) is given by N(¢) = max{n: S, < t} where So = 0. 
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Remark Another way of obtaining the density function of S, is to note that 
because S, is the time of the th event, 


P{t <8, <t+h} = P{N(t) =n-—1, one event in (¢,t + h)} + o(h) 
= P{N(t) = n— 1}P{one event in (t,t + h)} + o(h) 


n—1 
= lal + o(h)] + o(h) 
n—1 
yeu an pit tow 


where the first equality uses the fact that the probability of 2 or more events in 
(t,t + h) is o(h). If we now divide both sides of the preceding equation by h and 
then let h > 0, we obtain 


Bere ce 


fs, (t) = Ae BEE 


5.3.4 Further Properties of Poisson Processes 


Consider a Poisson process {N(t),t > 0} having rate A, and suppose that each 
time an event occurs it is classified as either a type I or a type II event. Suppose 
further that each event is classified as a type I event with probability p or a type II 
event with probability 1 — p, independently of all other events. For example, 
suppose that customers arrive at a store in accordance with a Poisson process 


having rate 4; and suppose that each arrival is male with probability 5 and female 


with probability 5. Then a type I event would correspond to a male arrival and 
a type II event to a female arrival. 

Let Ni (t) and N2(t) denote respectively the number of type I and type II events 
occurring in [0, zt]. Note that N(t) = Nj(t) + No(2). 


Proposition 5.2 {Ni(t),t > O} and {No(t),t > 0} are both Poisson processes 
having respective rates Ap and A(1 — p). Furthermore, the two processes are 
independent. 


Proof. It is easy to verify that {Nj (£), t > 0} is a Poisson process with rate Ap by 
verifying that it satisfies Definition 5.3. 


* —Nj(0) = 0 follows from the fact that N(0) = 0. 

* It is easy to see that {Nj (t),¢ > 0} inherits the stationary and independent increment 
properties of the process {N(t),t > 0}. This is true because the distribution of the 
number of type I events in an interval can be obtained by conditioning on the number 
of events in that interval, and the distribution of this latter quantity depends only on the 
length of the interval and is independent of what has occurred in any nonoverlapping 
interval. 
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*  P{N1 (A) = 1} = P{N1 (A) = 1| NWA) = 1P{N() = 1} 
+ P{INi (A) =1| NC) B 2}P{N(A) BS 2} 
p(Ah + o(h)) + o(h) 
= Aph + o(h) 
*  P{N1 (A) > 2} < P{N(A) B 2} = o(h) 


Thus we see that {N1(t), t > 0} is a Poisson process with rate Ap and, by a similar 
argument, that {N2(t),t > 0} is a Poisson process with rate 4(1 — p). Because 
the probability of a type I event in the interval from ft to t + h is independent 
of all that occurs in intervals that do not overlap (t,t + h), it is independent of 
knowledge of when type II events occur, showing that the two Poisson processes 
are independent. (For another way of proving independence, see Example 3.23.) 

| 


Example 5.14 If immigrants to area A arrive at a Poisson rate of ten per week, 
and if each immigrant is of English descent with probability b> then what is the 
probability that no people of English descent will emigrate to area A during the 
month of February? 


Solution: By the previous proposition it follows that the number of English- 
men emigrating to area A during the month of February is Poisson distributed 
with mean 4-10-74 = 42. Hence, the desired probability is e~10/?. a 


Example 5.15 Suppose nonnegative offers to buy an item that you want to sell 
arrive according to a Poisson process with rate 4. Assume that each offer is the 
value of a continuous random variable having density function f(x). Once the 
offer is presented to you, you must either accept it or reject it and wait for the next 
offer. We suppose that you incur costs at a rate c per unit time until the item is 
sold, and that your objective is to maximize your expected total return, where 
the total return is equal to the amount received minus the total cost incurred. 
Suppose you employ the policy of accepting the first offer that is greater than 
some specified value y. (Such a type of policy, which we call a y-policy, can be 
shown to be optimal.) What is the best value of y? 


Solution: Let us compute the expected total return when you use the y-policy, 
and then choose the value of y that maximizes this quantity. Let X denote 
the value of a random offer, and let F(x) = P{X > x} = fe f (v) du be its tail 
distribution function. Because each offer will be greater than y with probability 
F(y), it follows that such offers occur according to a Poisson process with rate 
AF(y). Hence, the time until an offer is accepted is an exponential random 
variable with rate AF(y). Letting R(y) denote the total return from the policy 
that accepts the first offer that is greater than y, we have 


E[R(y)] = E[accepted offer] — cE[time to accept] 
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aa i xfx|x>y(x) dx — 
— [ AC) dx = a 
y Fo) AF(y) 
dee xf (x) dx —c/d 
Fy) 


ae 
AF(y) 


Differentiation yields 
Fe) 7 or) 
Fy ElRON = 0 > —FOyFO) + (/ xf (x) dx — =\Fo) =0 
y ‘ d 


Therefore, the optimal value of y satisfies 


yey) = | xf (x) d -< 
¥ 


or 
yf fedr= [ sfeoae- S 
y y. r 
or 
i (x — y)f() dx = ¢ (5.14) 
y 


We now argue that the left-hand side of the preceding is a nonincreasing func- 
tion of y. To do so, note that, with at defined to equal a if a > 0 or to equal 
0 otherwise, we have 


/ (oc — y)f (x) dx = E[U(X — y)4] 


y 


Because (X — y)* is a nonincreasing function of y, so is its expectation, thus 
showing that the left hand side of Equation (5.14) is a nonincreasing func- 
tion of y. Consequently, if ELX] < c/A—in which case there is no solution of 
Equation (5.14)—then it is optimal to accept any offer; otherwise, the optimal 
value y is the unique solution of Equation (5.14). a 


It follows from Proposition 5.2 that if each of a Poisson number of individuals 
is independently classified into one of two possible groups with respective proba- 
bilities p and 1 — p, then the number of individuals in each of the two groups will 
be independent Poisson random variables. Because this result easily generalizes to 
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the case where the classification is into any one of r possible groups, we have the 
following application to a model of employees moving about in an organization. 


Example 5.16 Consider a system in which individuals at any time are classified 
as being in one of r possible states, and assume that an individual changes states in 
accordance with a Markov chain having transition probabilities P;, i,j = 1,...,7. 
That is, if an individual is in state i during a time period then, independently of 
its previous states, it will be in state j during the next time period with probabil- 
ity P;;. The individuals are assumed to move through the system independently 
of each other. Suppose that the numbers of people initially in states 1,2,..., rare 
independent Poisson random variables with respective means A1,A2,...,47. We 
are interested in determining the joint distribution of the numbers of individuals 
in states 1,2,...,7 at some time 7. 


Solution: For fixed i, let Nj(i),j = 1,...,7 denote the number of those individ- 
uals, initially in state i, that are in state j at time 7. Now each of the (Poisson dis- 
tributed) number of people initially in state i will, independently of each other, 
be in state j at time 7 with probability Pi, where Pi is the n-stage transition 
probability for the Markov chain having transition probabilities P;;. Hence, the 
Nj(i),j = 1,...,7 will be independent Poisson random variables with respec- 
tive means AiPi, j =1,...,7r. Because the sum of independent Poisson random 
variables is itself a Poisson random variable, it follows that the number of indi- 
viduals in state j at time »—namely )7j_, Nj(é)—will be independent Poisson 
random variables with respective means 0); AGP is forj=1,...,7. a 
Example 5.17 (The Coupon Collecting Problem) There are m different types of 
coupons. Each time a person collects a coupon it is, independently of ones previ- 
ously obtained, a type j coupon with probability p;, pe pj = 1. Let N denote 
the number of coupons one needs to collect in order to have a complete collection 
of at least one of each type. Find E[N]. 


Solution: If we let Nj denote the number one must collect to obtain a type j 
coupon, then we can express N as 


N= max N; 
1<jxm 


However, even though each N; is geometric with parameter p;, the foregoing 
representation of N is not that useful, because the random variables Nj; are not 
independent. 

We can, however, transform the problem into one of determining the 
expected value of the maximum of independent random variables. To do so, 
suppose that coupons are collected at times chosen according to a Poisson pro- 
cess with rate A = 1. Say that an event of this Poisson process is of type /, 
1 <j <_™m, if the coupon obtained at that time is a type / coupon. If we now let 
Nj(t) denote the number of type j coupons collected by time t, then it follows 


5.3 The Poisson Process 323 


from Proposition 5.2 that {Nj(t),t > 0},j = 1,...,7 are independent Poisson 
processes with respective rates Ap; = pj. Let X; denote the time of the first 
event of the jth process, and let 


X= max Xj 
1<j<xm 


denote the time at which a complete collection is amassed. Since the X; are 
independent exponential random variables with respective rates pj;, it follows 


that 


P{X < t} = P{maxi<¢j<m Xj < t} 
SX et, forge = 15.) 


=||[da-e*) 
j=l 
Therefore, 


E[X] =| rx> t} dt 
0 
= f° {1-T]a- 2% at (5.15) 
0 j=l 


It remains to relate E[X], the expected time until one has a complete set, 
to E[N], the expected number of coupons it takes. This can be done by let- 
ting T; denote the ith interarrival time of the Poisson process that counts the 
number of coupons obtained. Then it is easy to see that 


Since the T; are independent exponentials with rate 1, and N is independent 
of the T;, we see that 


E[X|N] = NE[T;] = N 


Therefore, 


and so E[N] is as given in Equation (5.15). 
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Let us now compute the expected number of types that appear only once in 
the complete collection. Letting I; equal 1 if there is only a single type i coupon 
in the final set, and letting it equal 0 otherwise, we thus want 


aps 1 = EL 
i=1 i=1 
= Pia 
i=1 


Now there will be a single type i coupon in the final set if a coupon of each 
type has appeared before the second coupon of type i is obtained. Thus, letting 
S; denote the time at which the second type i coupon is obtained, we have 


P{I; = 1} = P{X; < Sj, for all 7 4 i} 


Using that S; has a gamma distribution with parameters (2, p;), this yields 
[o,@) 
PU; =1}= / P{X; < S; for all j 4 i|S; = x}pje7P* pix dx 
0 


= [ P{X; <x, for all j 4 i}p? xe? dx 


fe [[a — e Pi*)p? xe P* dx 


jAl 


Therefore, we have the result 


‘[ye]- i ICE e Pi*\p? xe P® de 


a 


-[- *T]¢ ~9 ort ePix 7 


The next probability calculation related to Poisson processes that we shall 
determine is the probability that 7 events occur in one Poisson process before m 
events have occurred in a second and independent Poisson process. More formally 
let {Ny (£), t > 0} and {N2(£), t > 0} be two independent Poisson processes having 
respective rates 41 and A2. Also, let S! denote the time of the mth event of the first 
process, and S?, the time of the mth event of the second process. We seek 


PLS; < Sin} 
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Before attempting to calculate this for general and m, let us consider the 
special case n = m = 1. Since Ss the time of the first event of the Nj (¢) process, 
and Si the time of the first event of the N2(t) process, are both exponentially 
distributed random variables (by Proposition 5.1) with respective means 1/1 
and 1/A2, it follows from Section 5.2.3 that 


Xr 
P{s} < Si} = = rr (5.16) 


Let us now consider the probability that two events occur in the N1(t) process 
before a single event has occurred in the N2(t) process. That is, P{S} < St}. To 
calculate this we reason as follows: In order for the N;(t) process to have two 
events before a single event occurs in the N»(t) process, it is first necessary for the 
initial event that occurs to be an event of the Nj(¢) process (and this occurs, by 
Equation (5.16), with probability 41/(A1 + 42)). Now, given that the initial event 
is from the Nj (t) process, the next thing that must occur for st to be less than Ss is 
for the second event also to be an event of the N1(t) process. However, when the 
first event occurs both processes start all over again (by the memoryless property 
of Poisson processes) and hence this conditional probability is also 41/(A1 + 2); 
thus, the desired probability is given by 


Pist <si=(- 2) 


Aq + d2 


In fact, this reasoning shows that each event that occurs is going to be an event 
of the Ni(t) process with probability .4/(A1 + 42) or an event of the N2(t) 
process with probability 42/(A1 + 42), independent of all that has previously 
occurred. In other words, the probability that the Nj (£) process reaches n before 
the N»(t) process reaches m is just the probability that 7 heads will appear before 
m tails if one flips a coin having probability p = 41/(A1 + 42) of a head appearing. 
But by noting that this event will occur if and only if the first n + m— 1 tosses 
result in 2 or more heads, we see that our desired probability is given by 


ntm—1 k nt+tm—1—k 
aes du ( k Vg; ae =) (;, + ho 


k=n 


5.3.5 Conditional Distribution of the Arrival Times 


Suppose we are told that exactly one event of a Poisson process has taken place 
by time t, and we are asked to determine the distribution of the time at which 
the event occurred. Now, since a Poisson process possesses stationary and inde- 
pendent increments it seems reasonable that each interval in [0, t] of equal length 
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should have the same probability of containing the event. In other words, the 
time of the event should be uniformly distributed over [0, ¢]. This is easily checked 
since, for s < t, 


P{T, <s,N(t) = 1} 
P{N() = 1} 


P{T, < s|N(@t) = 1} = 


__ P{1 event in [0,s), 0 events in [s, ¢]} 


P{N(@) = 1} 
__ P{1 event in [0, s)}P{0 events in [s, ¢]} 
P{N(t) = 1} 
Ase er *G->) 
= Ate 


This result may be generalized, but before doing so we need to introduce the 
concept of order statistics. 

Let Yj, Y2,..., Yx be m random variables. We say that Y(1), Y2),..-, Yin) are 
the order statistics corresponding to Y1, Y2,..., Yn if Yg) is the kth smallest value 
among Yj,..., Yn, R = 1,2,...,”. For instance, if m = 3 and Yj = 4, Y2 = 5, 
Y3 = 1 then Yq) = 1, Ya) = 4, Y3) = S. If the Y;, i= 1,...,, are independent 
identically distributed continuous random variables with probability density f, 
then the joint density of the order statistics Y(1), Y(2),..-, Y(m) is given by 


n 
f015925-+-5¥n) =m] [ for, Vp. <V2<+++ < Vy 
i=1 
The preceding follows since 


(i) (Yep, Yays---> Yo) will equal (v1, y2,.--,y2) if (Y1, Y2,-.., Yn) is equal to any of 
the m! permutations of (y1,y2,---5¥n)5 


and 


(ii) the probability density that (Y1, Y2,..., Yn) is equal to y;,,..., yj, is Tet foi) = 
TTji1 fj) when i1,...,in is a permutation of 1,2,...,7. 


If the Y;,2 = 1,...,”, are uniformly distributed over (0,t), then we obtain 
from the preceding that the joint density function of the order statistics 
Y(1)5 Y(2); whens. Y(n) is 


n' 
FYI, ¥25 +++ Yn) = Fs 0<¥1 <2 <---<n<t 


We are now ready for the following useful theorem. 
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Theorem 5.2 Given that N(t) = n, the 7 arrival times S;,...,S, have the same 
distribution as the order statistics corresponding to 1 independent random vari- 
ables uniformly distributed on the interval (0, t). 


Proof. To obtain the conditional density of S1,..., 5, given that N(t) = 1 note 
that for 0 < s; <--+ <s, < tthe event that S$; = 51,82 = 59,...,S8, = sn, N(t) = 
n is equivalent to the event that the first 7 + 1 interarrival times satisfy T, = s1, 
Tr = s2 —$1,.--) Tn = Sn — Sn—1, Tn41 > t — Sy. Hence, using Proposition 5.1, 
we have that the conditional joint density of S1,...,S, given that N(t) = 7 is as 
follows: 


f(s1,- «+9 Sny 1) 
$15 2258 n) => ———_ 
f( 1> »Sn | ) P{IN() =n} 
Rew *S1 Re AS2-S1) 22. PeTAGn~Sn—-1) pA (ES) 
et (At)”/n! 


n' 


ae O<st<-+-<S,<t 


which proves the result. a 


Remark The preceding result is usually paraphrased as stating that, under the 
condition that 7 events have occurred in (0, f), the times S1,..., S,, at which events 
occur, considered as unordered random variables, are distributed independently 
and uniformly in the interval (0, 2). 


Application of Theorem 5.2 (Sampling a Poisson Process) In Proposition 5.2 
we showed that if each event of a Poisson process is independently classified as 
a type I event with probability p and as a type II event with probability 1 — p 
then the counting processes of type I and type II events are independent Poisson 
processes with respective rates Ap and A(1 —p). Suppose now, however, that there 
are k possible types of events and that the probability that an event is classified 
as a type i event, i = 1,...,k, depends on the time the event occurs. Specifically, 
suppose that if an event occurs at time y then it will be classified as a type i 
event, independently of anything that has previously occurred, with probability 
P;(y),i = 1,...,R where es P;(y) = 1. Upon using Theorem 5.2 we can prove 
the following useful proposition. 


Proposition 5.3 If Nj(f),i=1,...,&, represents the number of type i events 
occurring by time t then Nj;(t),i = 1,...,&, are independent Poisson random 
variables having means 


t 
E[N;(¢)] = af P;(s) ds 
0 


Before proving this proposition, let us first illustrate its use. 


Example 5.18 (An Infinite Server Queue) Suppose that customers arrive at a 
service station in accordance with a Poisson process with rate 4. Upon arrival 
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the customer is immediately served by one of an infinite number of possible 
servers, and the service times are assumed to be independent with a common 
distribution G. What is the distribution of X(t), the number of customers that 
have completed service by time t? What is the distribution of Y(t), the number 
of customers that are being served at time f? 

To answer the preceding questions let us agree to call an entering customer a 
type I customer if he completes his service by time t and a type II customer if he 
does not complete his service by time t. Now, if the customer enters at time s, 
s < t, then he will be a type I customer if his service time is less than t — s. 
Since the service time distribution is G, the probability of this will be G(t — s). 
Similarly, a customer entering at time s, s < t, will be a type II customer with 
probability Git — s) = 1 — G(t —s). Hence, from Proposition 5.3 we obtain that 
the distribution of X(t), the number of customers that have completed service by 
time f, is Poisson distributed with mean 


t t 
E[X(£)] = a} G(t — s) ds = if G(y) dy (5.17) 
0 0 


Similarly, the distribution of Y(£), the number of customers being served at time t 
is Poisson with mean 


t t 
ELY(] =a f Gps) ds= af G(y) dy (5.18) 
0 0 


Furthermore, X(t) and Y(f) are independent. 

Suppose now that we are interested in computing the joint distribution of Y(¢) 
and Y(t + s)—that is, the joint distribution of the number in the system at time tf 
and at time ¢ + s. To accomplish this, say that an arrival is 


type 1: if he arrives before time t and completes service between t and t + s, 
type 2: if he arrives before t and completes service after t + s, 

type 3: if he arrives between t and tf + s and completes service after t + s, 
type 4: otherwise. 


Hence, an arrival at time y will be type i with probability P;(y) given by 


G@+s—y)—G(t-—y), ify <t 
Pi(y) = 0 


3 otherwise 


Gt +s-y), ify <t 
Po(y) = 
me ; otherwise 
G(t+s—y), ift<y<t+s 
P3(y) = 
a ' otherwise 


Paty) = 1— Pi(y) — P2) — P3) 
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Thus, if N; = Nj(s + t), 7 = 1,2,3, denotes the number of type i events that 
occur, then from Proposition 5.3, Nj, i = 1,2, 3, are independent Poisson random 
variables with respective means 


t+s 
EIN) =A / Py) dy, 1 = 1,2,3 
0 


Because 


Y(t) = Ni + No, 
Y(t +s) =No +N3 


it is now an easy matter to compute the joint distribution of Y(t) and Y(t + s). 
For instance, 


Cov[Y(@), Y¢+s)] 
= Cov(N, + No, No + N3) 
= Cov(N2, N2) by independence of Ni, N2, N3 
= Var(N2) 


t t 
=a f Gi+s—y dy=2 f G(u + s) du 
0 0 


where the last equality follows since the variance of a Poisson random variable 
equals its mean, and from the substitution u = t — y. Also, the joint distribution 
of Y(t) and Y(¢ + s) is as follows: 


P{Y® =1,Y@ +s) =f} = P{N1 + No =1,N2. + N3 =)} 
min(i,/) 
> PIN? =1,Ni =i-1,N3 =j-)) 
1=0 
min(i,/) 
de oy P{N> = }P{N, =i—-]}P{N3 =j-—]} | 
1=0 


Example 5.19 (Minimizing the Number of Encounters) Suppose that cars enter 
a one-way highway in accordance with a Poisson process with rate A. The cars 
enter at point a and depart at point b (see Figure 5.2). Each car travels at a 
constant speed that is randomly determined, independently from car to car, from 
the distribution G. When a faster car encounters a slower one, it passes it with 
no time being lost. If your car enters the highway at time s and you are able to 
choose your speed, what speed minimizes the expected number of encounters you 
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a b 


Figure 5.2 Cars enter at point a and depart at b. 


will have with other cars, where we say that an encounter occurs each time your 
car either passes or is passed by another car? 


Solution: We will show that for large s the speed that minimizes the expected 
number of encounters is the median of the speed distribution G. To see this, 
suppose that the speed x is chosen. Let d = b—a denote the length of the road. 
Upon choosing the speed x, it follows that your car will enter the road at time s 
and will depart at time s + to, where to = d/x is the travel time. 

Now, the other cars enter the road according to a Poisson process with 
rate 4. Each of them chooses a speed X according to the distribution G, and 
this results in a travel time T = d/X. Let F denote the distribution of travel 
time T. That is, 


F(t) = P{T < t} = P{d/X < t} = P{X > d/t} = G(d/t) 


Let us say that an event occurs at time ¢ if a car enters the highway at time f. 
Also, say that the event is a type 1 event if it results in an encounter with your 
car. Now, your car will enter the road at time s and will exit at time s + fo. 
Hence, a car will encounter your car if it enters before s and exits after s + to 
(in which case your car will pass it on the road) or if it enters after s but exits 
before s + to (in which case it will pass your car). As a result, a car that enters 
the road at time ¢ will encounter your car if its travel time T is such that 


t+T>s+to, ift<s 
t+T<s+to, ifs<t<s+t+t 


From the preceding, we see that an event at time t will, independently of other 
events, be a type 1 event with probability p(¢) given by 


Pt+T>sttip}=Fst+ip-d, ift<s 
pit) = 4 P{*t+T <s+ito} =F(st+ito—D), ifs<t<s+to 
0, ift>s+to 


Since events (that is, cars entering the road) are occurring according to a Poisson 
process it follows, upon applying Proposition 5.3, that the total number of 
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type 1 events that ever occurs is Poisson with mean 
[e-e) So s+to 
af pity dt =2 f Fett —ode+a f F(s + to — t) dt 
0 0 s 


Stto _ to 
=f Foydy +2 Fy) dy 
t 


0 


To choose the value of tp that minimizes the preceding quantity, we differen- 
tiate. This gives 


aa" / p(t) ar} = AM{F(s + to) — F(to) + F(to)} 
to 0 


Setting this equal to 0, and using that F(s + to) ~ 0 when s is large, we see 
that the optimal travel time to is such that 


F(to) — F(to) = 0 


or 


F(to) — [1 — F@o)] = 0 
or 


F(t) = 5 


That is, the optimal travel time is the median of the travel time distribution. 
Since the speed X is equal to the distance d divided by the travel time T, 
it follows that the optimal speed xo = d/to is such that 


F(d/x0) = 3 
Since 
F(d/xo) = G(xo) 


we see that G(xg) = 5 and so the optimal speed is the median of the distribu- 
tion of speeds. 

Summing up, we have shown that for any speed x the number of encoun- 
ters with other cars will be a Poisson random variable, and the mean of this 
Poisson will be smallest when the speed x is taken to be the median of the 
distribution G. a 


Example 5.20 (Tracking the Number of HIV Infections) There is a relatively 
long incubation period from the time when an individual becomes infected with 
the HIV virus, which causes AIDS, until the symptoms of the disease appear. 
As a result, it is difficult for public health officials to be certain of the number 
of members of the population that are infected at any given time. We will now 
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present a first approximation model for this phenomenon, which can be used to 
obtain a rough estimate of the number of infected individuals. 

Let us suppose that individuals contract the HIV virus in accordance with a 
Poisson process whose rate A is unknown. Suppose that the time from when an 
individual becomes infected until symptoms of the disease appear is a random 
variable having a known distribution G. Suppose also that the incubation times 
of different infected individuals are independent. 

Let N;(t) denote the number of individuals who have shown symptoms of 
the disease by time t. Also, let N2(¢) denote the number who are HIV positive 
but have not yet shown any symptoms by time t. Now, since an individual who 
contracts the virus at time s will have symptoms by time ¢ with probability G(t—s) 
and will not with probability G(¢—s), it follows from Proposition 5.3 that N(t) 
and N2(t) are independent Poisson random variables with respective means 


t t 
EIN (@)] = al G(t—s)ds= if G(y) dy 
0 0 
and 
t t 
FIND =2 f° Ge-9ds=2 f° Gon dy 
0 0 


Now, if we knew A, then we could use it to estimate N(t), the number of 
individuals infected but without any outward symptoms at time t, by its mean 
value E[N2(t)]. However, since A is unknown, we must first estimate it. Now, we 
will presumably know the value of N1(t), and so we can use its known value as 
an estimate of its mean E[N,(t)]. That is, if the number of individuals who have 
exhibited symptoms by time tf is 71, then we can estimate that 


t 
my ~ E[Ni()] =A [ Coy ay 


Therefore, we can estimate A by the quantity A given by 


t 
i=m/f G(y) dy 


Using this estimate of A, we can estimate the number of infected but symptomless 
individuals at time t by 


t 
estimate of No(t) = if G(y) dy 
0 


_ m1 fo GO) dy 
fo GO) dy 
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For example, suppose that G is exponential with mean p. Then G(y) = e~/", 
and a simple integration gives that 


1—e7t/h 
estimate of No(t) = a ee). 
tpl ec") 
If we suppose that t = 16 years, uw = 10 years, and m; = 220 thousand, then the 
estimate of the number of infected but symptomless individuals at time 16 is 


2,200(1 — e716) 


= 218.96 
16 —10(1 — e716) 


estimate = 


That is, if we suppose that the foregoing model is approximately correct (and 
we should be aware that the assumption of a constant infection rate 4 that 
is unchanging over time is almost certainly a weak point of the model), then 
if the incubation period is exponential with mean 10 years and if the total 
number of individuals who have exhibited AIDS symptoms during the first 
16 years of the epidemic is 220 thousand, then we can expect that approx- 
imately 219 thousand individuals are HIV positive though symptomless at 
time 16. a 


Proof of Proposition 5.3. Let us compute the joint probability P{N;(t) = n;, 
i= 1,...,k}. To do so note first that in order for there to have been n; type i 


events fori = 1,...,& there must have been a total of yy n; events. Hence, 
conditioning on N(t) yields 


P{N1(t) = m1,.--,Ng(t) = ng} 


k 
= PIN, (t) =m,...,Ng(t) = ng | Nit) = Son, 
i=1 


k 
x P N@) =n; 


i=1 


Now consider an arbitrary event that occurred in the interval [0, ¢]. If it had 
occurred at time s, then the probability that it would be a type i event would 
be P;(s). Hence, since by Theorem 5.2 this event will have occurred at some time 
uniformly distributed on [0, t], it follows that the probability that this event will 
be a type i event is 


1 t 
P; = “| P;(s) ds 
t Jo 
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independently of the other events. Hence, 
k 
PIN,(t) =n;,i=1,...,k | N@ = >o1; 
i=1 
will just equal the multinomial probability of n; type i outcomes fori=1,...,k 


when each of ys nj independent trials results in outcome i with probability 
P;, i=1,...,k. That is, 


: Ook ni! ; 
P Ni(t) = my... Ne) = 1 | N@) = oi = = ener 
i=] N1.°°* Np: 
Consequently, 


P{N1(t) = m1,.--, Ng(t) = ng} 


= ; nj)! pt... ptko-at (At) dit 
ny\---ng! | k (imi)! 
k 


=| fe ary /n)! 


i=1 


and the proof is complete. a 
We now present some additional examples of the usefulness of Theorem 5.2. 


Example 5.21 Insurance claims are made at times distributed according to a 
Poisson process with rate A; the successive claim amounts are independent random 
variables having distribution G with mean j, and are independent of the claim 
arrival times. Let S; and C; denote, respectively, the time and the amount of the 
ith claim. The total discounted cost of all claims made up to time f, call it D(®), 


is defined by 


Nit) 


DO= Soe 7G, 


i=1 


where @ is the discount rate and N(t) is the number of claims made by time t. To 
determine the expected value of D(t), we condition on N(f) to obtain 


[ee 


E[D@] = )> EID®IN() = ne 


n=0 


— yp At)” 
n! 
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Now, conditional on N(t) = 1, the claim arrival times S;,...,S, are distributed 
as the ordered values U1), ..., Ut) of independent uniform (0, ¢) random vari- 
ables U;,..., U,. Therefore, 


E([D(@)|N(@) =n] =E bp cet] 
i= 


= > 0 E[Cje 240] 
i=1 


= DU FICEle 8%] 


i=1 


where the final equality used the independence of the claim amounts and their 
arrival times. Because E[C;] = jz, continuing the preceding gives 


E[D(@)|N(@) = 2] = » >> Ele 2%] 
i=1 
= bE bs ta 
i=) 
— WE b =) 
=1 


The last equality follows because U(1),...,U(n) are the values Uj,...,U, in 
increasing order, and so 7_, e~®4@ = Y_, e~*4i, Continuing the string of 


equalities yields 


E[D@|N@) = 1] = npE[e*"] 


Therefore, 
E[D(t)|N(O)] = Nord — en *ty 


Taking expectations yields the result 


E[D()] = “Ka — ¢ ty | 
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Example 5.22 (An Optimization Example) Suppose that items arrive at a 
processing plant in accordance with a Poisson process with rate 4. At a fixed 
time T, all items are dispatched from the system. The problem is to choose an 
intermediate time, t € (0, T), at which all items in the system are dispatched, so 
as to minimize the total expected wait of all items. 

If we dispatch at time t, 0 < t < T, then the expected total wait of all items 
will be 


Re: HT Saye 


2 t 2 

To see why this is true, we reason as follows: The expected number of arrivals 
in (0, t) is At, and each arrival is uniformly distributed on (0,2), and hence has 
expected wait t/2. Thus, the expected total wait of items arriving in (0,7) is 
17/2. Similar reasoning holds for arrivals in (¢,T), and the preceding follows. 
To minimize this quantity, we differentiate with respect to t to obtain 


d| t (T —t)* 
ae 5 =At—A(T -b) 


and equating to 0 shows that the dispatch time that minimizes the expected total 
wait ist = T/2. | 


We end this section with a result, quite similar in spirit to Theorem 5.2, which 
states that given S,,, the time of the mth event, then the first 7 — 1 event times are 
distributed as the ordered values of a set of m — 1 random variables uniformly 
distributed on (0, S,). 


Proposition 5.4 Given that S, = t, the set §1,...,5,—1 has the distribution of a 
set of m — 1 independent uniform (0, t) random variables. 


Proof. We can prove the preceding in the same manner as we did Theorem 5.2, 
or we can argue more loosely as follows: 
S1,---5Sn-1 | Sy =t~ S1,...,8,-1|Sn=t, NG) =n-1 
~ $1,...,Sn-1| N@)=n-1 


where ~ means “has the same distribution as” and f7~ is infinitesimally smaller 
than t. The result now follows from Theorem 5.2. | 


5.3.6 Estimating Software Reliability 


When a new computer software package is developed, a testing procedure is often 
put into effect to eliminate the faults, or bugs, in the package. One common 
procedure is to try the package on a set of well-known problems to see if any 
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errors result. This goes on for some fixed time, with all resulting errors being 
noted. Then the testing stops and the package is carefully checked to determine 
the specific bugs that were responsible for the observed errors. The package is 
then altered to remove these bugs. Because we cannot be certain that all the bugs 
in the package have been eliminated, however, a problem of great importance is 
the estimation of the error rate of the revised software package. 

To model the preceding, let us suppose that initially the package contains an 
unknown number, m, of bugs, which we will refer to as bug 1, bug 2,..., bug 
m. Suppose also that bug i will cause errors to occur in accordance with a Pois- 
son process having an unknown rate 4j;,i = 1,...,m. Then, for instance, the 
number of errors due to bug i that occurs in any s units of operating time is 
Poisson distributed with mean Ajs. Also suppose that these Poisson processes 
caused by bugs i,i = 1,..., are independent. In addition, suppose that the 
package is to be run for ¢ time units with all resulting errors being noted. At the 
end of this time a careful check of the package is made to determine the spe- 
cific bugs that caused the errors (that is, a debugging, takes place). These bugs 
are removed, and the problem is then to determine the error rate for the revised 


package. 
If we let 
WO 1, if bug i has not caused an error by t 
m™~ 10, otherwise 


then the quantity we wish to estimate is 
AD = AMO 
i 
the error rate of the final package. To start, note that 
EIA] = So MEW] 
i 


= nem (5.19) 


Now each of the bugs that is discovered would have been responsible for a certain 
number of errors. Let us denote by Mj(t) the number of bugs that were responsible 
for j errors, j > 1. That is, M1(t) is the number of bugs that caused exactly one 
error, M(t) is the number that caused two errors, and so on, with IMO 
equaling the total number of errors that resulted. To compute E[M,(t)], let us 
define the indicator variables, I;(t),i > 1, by 


1, bug i causes exactly 1 error 
0, otherwise 


[@ = | 
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Then, 
Mi@) = >> I(t) 


and so 


E[Mi(t)] = )) ELi(t)] = 0 Aste" (5.20) 


Thus, from (5.19) and (5.20) we obtain the intriguing result that 
M 
E Jaw = ne] ~0 (5.21) 


Thus suggests the possible use of Mj(t)/t as an estimate of A(t). To determine 
whether or not Mj(¢)/t constitutes a “good” estimate of A(t) we shall look at 
how far apart these two quantities tend to be. That is, we will compute 


2 
E (se _ ae) = Var (aw — me) from (5.21) 


= Var(A(t)) — “Cov(A(t),Mi(0) + Vari (6) 
Now, 
Var(A(t)) = ) A? Var(Wil) = Do age **(1 — eo), 
Var(Mi (¢)) = 3 Var(Li(t)) = > ate — dite), 
Cov(A(t), Mi(t)) = cov(> awit), > 1) 
i j 
= DD Covaivi, GO) 
ij 
=D ACovvi, LO) 


1 


=— yy joe hte 
i 


where the last two equalities follow since w;(t) and J;(t) are independent when 
i #j because they refer to different Poisson processes and y;(t)I;(t) = 0. Hence, 
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we obtain 


2 
" (se 2 ne = yao 4 + ue ™ 


E[M,(t) + 2M) (¢)] 
= 2 


where the last equality follows from (5.20) and the identity (which we leave as 
an exercise) 


E[M2(t)] = 5 ier (5.22) 


Thus, we can estimate the average square of the difference between A(t) and 
M,(t)/t by the observed value of M;(t) + 2Mp(t) divided by #?. 


Example 5.23 Suppose that in 100 units of operating time 20 bugs are discov- 
ered of which two resulted in exactly one, and three resulted in exactly two, 
errors. Then we would estimate that A(100) is something akin to the value of 
a random variable whose mean is equal to 1/50 and whose variance is equal to 
8/10,000. a 


5.4 Generalizations of the Poisson Process 


5.4.1 Nonhomogeneous Poisson Process 


In this section we consider two generalizations of the Poisson process. The first 
of these is the nonhomogeneous, also called the nonstationary, Poisson process, 
which is obtained by allowing the arrival rate at time ¢ to be a function of tf. 


Definition 5.4 The counting process {N(t), t >0} is said to be a nonhomoge- 
neous Poisson process with intensity function X(t), t > 0, if 


) N(O)=0. 

) {N(@),t > 0} has independent increments. 
(iii) P{N@ +h) — N() > 2} = ofA). 

) PING +h)—N(t) =1}=a@h + o(h). 


Time sampling an ordinary Poisson process generates a nonhomogeneous Poisson 
process. That is, let {N(¢),t > 0} be a Poisson process with rate A, and suppose 
that an event occurring at time ¢ is, independently of what has occurred prior 
to t, counted with probability p(t). With N,(t) denoting the number of counted 
events by time t, the counting process {N,(t), t > 0} is a nonhomogeneous Pois- 
son process with intensity function A(t) = Ap(t). This is verified by noting that 
{N,(t), t > 0} satisfies the nonhomogeneous Poisson process axioms. 
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1. N,(0) =0. 

2. The number of counted events in (s,s + ft) depends solely on the number of events 
of the Poisson process in (s,s + t), which is independent of what has occurred prior 
to time s. Consequently, the number of counted events in (s,s + t) is independent of 
the process of counted events prior to s, thus establishing the independent increment 
property. 

3. Let Ne(t,t + h) = N-(t + h) — N-(t), with a similar definition of N(t,t + h). 


PIN- (t,t + h) > 2} < P{NG,t + h) S 2} = o(h) 


4. To compute P{N-(t,t + h) = 1}, condition on N(¢,t + h). 
P{N. (t,t + h) = 1} 
= P{N-(t,t + h) = 1|N(t,t + A) = 1}P{N(t,t +h) = 1} 
+ P{N-(@,t + h) = 1|NG,t + h) > 2}P{N(t,t + h) > 2} 
= P{N- (t,t +h) = 1|N(t,t + A) = 1}Ah + o(h) 
= p(t)Ah + o(h) 


Not only does time sampling a Poisson process result in a nonhomogeneous 
Poisson process, but it also works the other way: every nonhomogeneous Pois- 
son process with a bounded intensity function can be thought of as being a time 
sampling of a Poisson process. To show this, we start by showing that the super- 
position of two independent nonhomogeneous Poisson processes remains a non- 
homogeneous Poisson process. 


Proposition 5.4 Let {N(t),¢ > 0}, and {M(t),t > 0}, be independent nonhomo- 
geneous Poisson processes, with respective intensity functions A(t) and s(t), and 
let N*(t) = N(t) + M(t). Then, the following are true. 


(a) {N*(t),t > 0} is a nonhomogeneous Poisson process with intensity function A(t) + 
w(). 
(b) Given that an event of the {N*(t)} process occurs at time ¢ then, independent of 
what occurred prior to t, the event at t was from the {N(t)} process with probability 
AL) 
A+) * 
Proof. To verify that {N*(z),t > 0} is a nonhomogeneous Poisson process with 
intensity function A(f) + j(t), we will argue that it satisfies the nonhomogeneous 
Poisson process axioms. 


1. N*(0) = N(O) + M(O) = 0. 

2. To verify independent increments, let I1,..., I, be nonoverlapping intervals. Let N(I) 
and M(1) denote, respectively, the number of events from the {N(¢)} process and 
from the {M(t)} process that are in the interval I. Because each counting process 
has independent increments, and the two processes are independent of each other, it 
follows that N(Iy),..., NU), M(1),..-, MU) are all independent, and thus so are 
Ny) + Md),...; NU) + Mdy), which shows that {N*(¢),¢ > 0} also possesses 
independent increments. 

3. In order for there to be exactly one event of the N* process between ¢ and t + h, 
either there must be one event of the N process and 0 events of the M process or the 
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reverse. The first of these mutually exclusive possibilities occurs with probability 
P{N(G,t +h) = 1, M(t, t + h) = 0} 
= P{N(t,t + h) = 1} P{M(t,t + h) = 0} 
= A@)h + o(h)) (1 — w@)A + off) 
= A(t)h + o(h) 


Similarly, the second possibility occurs with probability 
P{N(G,t +h) = 0,M(t,t + h) = 1} = wh + o(h) 
yielding 
PIN*(¢ +h) —N*(t) = 1} = AQ + uM)A + O(h) 


4. In order for there to be at least two events of the N* process between t and t + h, 
one of the following three possibilities must occur: there is at least two events of the 
N process between t and t + h; there is at least two events of the M process between 
t and t + h; or both processes have exactly one event between ¢ and t + h. Each of 
the first two of these possibilities occurs with probability o(4), while the third occurs 
with probability (A@h + o(h))(u@h 4+ o(h)) = o(A). Thus, 


P{N*(t + h) — N*(t) > 2} < o(h) 


Thus, part (a) is proven. 

To prove (b), note first that it follows from independent increments that which 
process caused the event at time ¢ is independent of what occurred prior to t. To 
find the conditional probability that the event at time t is from the N process, we 
use that 


P{N(t,t + h) = 1, M(t,t + h) = 0} 
P{N*(t,t +h) = 1} 
_ Ah + o(h) 
~ (A(t) + U@)h + o(h) 
at) + 2) 


AO +n + 


P{N(t,t + h) = 1|N*(t,t +h) = 1} 


Letting 4 — 0 in the preceding proves (b). a 


Now, suppose that {N(£), t > 0} is a nonhomogeneous Poisson process with a 
bounded intensity function A(t), and suppose that A is such that A(£) < A, for all 
t > 0. Letting {M(t), t > 0} be anonhomogeneous Poisson process with intensity 
function u(t) = A — A(t), t > O, that is independent of {N(t),t > 0}, it follows 
from Proposition 5.4 that {N(t),t > 0} can be regarded as being the process 
of time-sampled events of the Poisson process {N(t) + M(¢),t > 0}, where an 
event of the Poisson process that occurs at time t is counted with probability 


p(t) = A)/d. 
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With this interpretation of a nonhomogeneous Poisson process as being a 
time-sampled Poisson process, the number of events of the nonhomogeneous 
Poisson process by time ¢, namely, N(t), is equal to the number of counted events 
of the Poisson process by time t. Consequently, from Proposition 5.3 it follows 
that N(t) is a Poisson random variable with mean 


* ry) 


t 
BIN@l=a fo Pdy= [ rordy 
0 0 


Moreover, by regarding the nonhomogeneous Poisson process as starting at 

time s, the preceding yields that N(¢ + s) — N(t), the number of events in its first t 

time units, is a Poisson random variable with mean rs Ms+y)dy= shee A(y) dy. 
The function m(t) defined by 


t 
mit) = [roy dy 


is called the mean value function of the nonhomogeneous Poisson process. 


Remark That N(s++#)—N(s) has a Poisson distribution with mean er A(y) dy 
is a consequence of the Poisson limit of the sum of independent Bernoulli random 
variables (see Example 2.47). To see why, subdivide the interval [s,s + ¢] into n 
subintervals of length £, where subinterval i goes from s + (i—1)4 tos + i£, 
i=1,...,2. Let Nj = N(s + i£) — N(s + (¢— 1)£) be the number of events that 
occur in subinterval 7, and note that 


n 
P{> 2 events in some subinterval} = P (Us > 2) 
i=1 


< DU PIN: > 2) 
i=1 
= no(t/n) by Axiom (iii) 


Because 


o(t/n) = 


t/n 


0 


lim no(t/n) = lim t 
n— Oo n+ oo 


it follows that, as 7 increases to oo, the probability of having two or more events 
in any of the 7 subintervals goes to 0. Consequently, with a probability going to 1, 
N(¢) will equal the number of subintervals in which an event occurs. Because the 
probability of an event in subinterval i is A(s + i4)* + o(4), it follows, because 
the number of events in different subintervals are independent, that when 7 is 
large the number of subintervals that contain an event is approximately a Poisson 
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random variable with mean 


n 


Ya (sti) F + nocerm 
nN nN 


i=1 
But, 


n 


t\t stt 
Jim oa (s+ ix) A + no(t/n) =| A(y) dy 


i=1 


and the result follows. | 


The importance of the nonhomogeneous Poisson process resides in the fact 
that we no longer require the condition of stationary increments. Thus we now 
allow for the possibility that events may be more likely to occur at certain times 
than during other times. 


Example 5.24 Siegbert runs a hot dog stand that opens at 8 A.M. From 8 until 
11 A.M. customers seem to arrive, on the average, at a steadily increasing rate 
that starts with an initial rate of 5 customers per hour at 8 A.M. and reaches a 
maximum of 20 customers per hour at 11 A.M. From 11 A.M. until 1 P.M. the 
(average) rate seems to remain constant at 20 customers per hour. However, 
the (average) arrival rate then drops steadily from 1 P.M. until closing time at 
5 P.M. at which time it has the value of 12 customers per hour. If we assume 
that the numbers of customers arriving at Siegbert’s stand during disjoint time 
periods are independent, then what is a good probability model for the preced- 
ing? What is the probability that no customers arrive between 8:30 A.M. and 
9:30 A.M. on Monday morning? What is the expected number of arrivals in this 
period? 


Solution: A good model for the preceding would be to assume that arrivals 
constitute a nonhomogeneous Poisson process with intensity function A(t) 
given by 


5+ St, 0<t<3 
x(t) = 420, 3<t<5 
20-2(t-5), S5<t<9 


and 
A(t) = A(t — 9) fort >9 


Note that N(t) represents the number of arrivals during the first ¢ hours that 
the store is open. That is, we do not count the hours between 5 P.M. and 8 A.M. 
If for some reason we wanted N(t) to represent the number of arrivals during 
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the first t hours regardless of whether the store was open or not, then, assuming 
that the process begins at midnight we would let 


0 OSes 


5G = 8); SSeS 11 
a(t) = 420, 11-4413 
20=9¢=413), 13< 7247 
0, 17 <t<24 


and 
A(t) = A(t — 24) for t > 24 


As the number of arrivals between 8:30 A.M. and 9:30 A.M. will be Poisson 
with mean m(3) - m(5) in the first representation (and my) _ m(4Z) in the 
second representation), we have that the probability that this number is zero is 


3/2 
exp {- | (5 + St) a} = e710 
1/2 


and the mean number of arrivals is 


3/2 
. (5 + 5t) dt = 10 | 
1/2 


Suppose that events occur according to a Poisson process with rate A, and 
suppose that, independent of what has previously occurred, an event at time 
s is a type 1 event with probability P,(s) or a type 2 event with probability 
P2(s) = 1 — Py(s). If Nj(t),t > 0, denotes the number of type i events by time f, 
then it easily follows from Definition 5.4 that {N,(t),t > 0} and {No(t),t > 0} 
are independent nonhomogeneous Poisson processes with respective intensity 
functions A,(t) = AP;(t), i = 1,2. (The proof mimics that of Proposition 5.2.) This 
result gives us another way of understanding (or of proving) the time sampling 
Poisson process result of Proposition 5.3, which states that Ni(f) and N2(f) 
are independent Poisson random variables with means E[Nj(t)] = 4 yi P;(s) ds, 
i= 1,2. 


Example 5.25 (The Output Process of an Infinite Server Poisson Queue) It turns 
out that the output process of the M/G/oo queue—that is, of the infinite server 
queue having Poisson arrivals and general service distribution G—is a non- 
homogeneous Poisson process having intensity function A(t) = AG(t). To verify 
this claim, let us first argue that the departure process has independent incre- 
ments. Towards this end, consider nonoverlapping intervals O1,...,O,; now 
say that an arrival is type i,i = 1,...,, if that arrival departs in the interval O,. 
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By Proposition 5.3, it follows that the numbers of departures in these intervals 
are independent, thus establishing independent increments. Now, suppose that 
an arrival is “counted” if that arrival departs between ¢ and t + h. Because an 
arrival at time s,s <t + A, will be counted with probability G(t—s + h)—G(t-s), 
it follows from Proposition 5.3 that the number of departures in (t,t + A) is a 
Poisson random variable with mean 


t+h t+h 
if [Gé—s+h)—Gt-—s)]ds= af [G'(t —s + h)h + o(h)] ds 
0 0 


t+h 
= a | G'(y) dy + o(h) 
0 
= AG(t)h + o(h) 
Therefore, 
P{1 departure in (t,t + h)} =~G(he*CO" + o(h) = AG(Hh + o(h) 
and 
P{> 2 departures in (t,t + h)} = o(h) 


which completes the verification. a 


If we let S,, denote the time of the mth event of the nonhomogeneous Poisson 
process, then we can obtain its density as follows: 


P{t < $8, <t+h}=P{N(t) =n-—1, one event in (¢,t + h)} + o(h) 
= P{N(t) =n — 1}P{one event in (t,t + h)}+ 0(h) 


n—-1 
= mo MO! nh + o(f)] + 0(h) 
(n— 1)! 
— mo HOY 
_ mt) PENS 
= X(t)e @_DI h+o(h) 
which implies that 
— my HOY 
_ m(t) PENS 
where 


t 
mit) = | A(s) ds 
0 
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5.4.2 Compound Poisson Process 


A stochastic process {X(t),t > 0} is said to be a compound Poisson process if it 
can be represented as 


N() 
X= > Yj, t>0 (5.23) 
i=1 


where {N(t), t > O} is a Poisson process, and {Y;,i > 1} is a family of independent 
and identically distributed random variables that is also independent of {N(#), 
t > 0}. As noted in Chapter 3, the random variable X(Z) is said to be a compound 
Poisson random variable. 


Examples of Compound Poisson Processes 


(i) If Y; = 1, then X(t) = N(), and so we have the usual Poisson process. 

(ii) Suppose that buses arrive at a sporting event in accordance with a Poisson process, 
and suppose that the numbers of fans in each bus are assumed to be independent and 
identically distributed. Then {X(¢), t > 0} isa compound Poisson process where X (£) 
denotes the number of fans who have arrived by t. In Equation (5.23) Y; represents 
the number of fans in the ith bus. 

(iit) Suppose customers leave a supermarket in accordance with a Poisson process. If 


the Y;, the amount spent by the ith customer, i = 1,2,..., are independent and 
identically distributed, then {X(t),t > 0} is a compound Poisson process when X(f) 
denotes the total amount of money spent by time f. | 


Because X(t) is a compound Poisson random variable with Poisson parameter 
At, we have from Examples 3.10 and 3.17 that 
E[X(®)] = AtELY1] (5.24) 


and 
Var(X(t)) = AtE[ 7] (5.25) 


Example 5.26 Suppose that families migrate to an area at a Poisson rate A = 2 
per week. If the number of people in each family is independent and takes on the 
values 1, 2, 3, 4 with respective probabilities és 3 3 é5 then what is the expected 
value and variance of the number of individuals migrating to this area during a 
fixed five-week period? 


Solution: Letting Y; denote the number of people in the ith family, we have 
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Hence, letting X(5) denote the number of immigrants during a five-week period, 
we obtain from Equations (5.24) and (5.25) that 


E[X(5)] =2-5-3 =25 
and 
Var[X(5)] = 25-43 = 215 7 


Example 5.27 (Busy Periods in Single-Server Poisson Arrival Queues) Consider 
a single-server service station in which customers arrive according to a Poisson 
process having rate 4. An arriving customer is immediately served if the server 
is free; if not, the customer waits in line (that is, he or she joins the queue). The 
successive service times are independent with a common distribution. 

Such a system will alternate between idle periods when there are no customers 
in the system, so the server is idle, and busy periods when there are customers in 
the system, so the server is busy. A busy period will begin when an arrival finds 
the system empty, and because of the memoryless property of the Poisson arrivals 
it follows that the distribution of the length of a busy period will be the same for 
each such period. Let B denote the length of a busy period. We will compute its 
mean and variance. 

To begin, let § denote the service time of the first customer in the busy period 
and let N(S) denote the number of arrivals during that time. Now, if N(S) = 0 
then the busy period will end when the initial customer completes his service, 
and so B will equal S in this case. Now, suppose that one customer arrives during 
the service time of the initial customer. Then, at time S there will be a single 
customer in the system who is just about to enter service. Because the arrival 
stream from time S$ on will still be a Poisson process with rate A, it thus follows 
that the additional time from S$ until the system becomes empty will have the 
same distribution as a busy period. That is, if N(S) = 1 then 


B=S+B, 


where By is independent of S and has the same distribution as B. 

Now, consider the general case where N(S) = n, so there will be 7 customers 
waiting when the server finishes his initial service. To determine the distribution 
of remaining time in the busy period note that the order in which customers are 
served will not affect the remaining time. Hence, let us suppose that the 7 arrivals, 
call them Cj,..., C,, during the initial service period are served as follows. Cus- 
tomer Cj is served first, but C2 is not served until the only customers in the 
system are C,...,C,,. For instance, any customers arriving during C,’s service 
time will be served before C2. Similarly, C3 is not served until the system is free 
of all customers but C3,..., C,, and so on. A little thought reveals that the times 
between the beginnings of service of customers C; and Cj41, i = 1,...,n — 1, 
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and the time from the beginning of service of C, until there are no customers 
in the system, are independent random variables, each distributed as a busy 
period. 

It follows from the preceding that if we let By,Bo,... be a sequence of indepen- 
dent random variables, each distributed as a busy period, then we can express B as 


N(S) 
B=S+) B; 
i=1 


Hence, 
N(S) 
E[B|S]=S+E| > BS 
i=1 


and 


N(S) 
Var(B|S) = Var oe B;|S 
i=1 


However, given S, ~ ) B; is a compound Poisson random variable, and thus 


from Equations (5.24) and (5.25) we obtain 
E[B|S] = S + ASE[B] = (1 + AE[B])S 
Var(B|S) = ASE[B*] 
Hence, 
E[B] = E{E[B|S]] = (1 + E[B])ELS] 
implying, provided that AE[S] < 1, that 


E[S] 


“l= T= BIS] 


Also, by the conditional variance formula 


Var(B) = Var(E[B|S]) + E[Var(B|S)] 
= (1 + AE[B])*Var(S) + AE[S]E[B*] 
= (1 + AE[B])7Var(S) + AE[S](Var(B) + E7[B]) 
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yielding 
_ Var(S)(1 + AE[B])* + AE[S]E*[B] 
eS 1 — AE[S] 
Using E[B] S]/(1 — AE[S]), we obtain 
3 
Var(B) = Var(S) + AE°[S] 2 


(1 — AES)? 


There is a very nice representation of the compound Poisson process when the 
set of possible values of the Y; is finite or countably infinite. So let us suppose 
that there are numbers aj, j > 1, such that 


PLY; = aj} = pj, a =1 
j 
Now, a compound Poisson process arises when events occur according to a Pois- 
son process and each event results in a random amount Y being added to the 
cumulative sum. Let us say that the event is a type j event whenever it results 
in adding the amount a,j > 1. That is, the ith event of the Poisson process is a 
type j event if Y; = a;. If we let N;(¢) denote the number of type j events by time f¢, 


then it follows from Proposition 5.2 that the random variables Nj(t), 7 > 1, are 
independent Poisson random variables with respective means 


E[Nj(t)] = Apjt 


Since, for each j, the amount a; is added to the cumulative sum a total of N;(t) 
times by time ¢, it follows that the cumulative sum at time t can be expressed as 


X(t) = > ajNj@) (5.26) 
j 
As a check of Equation (5.26), let us use it to compute the mean and variance 
of X(t). This yields 


E[X(#)] = ap ojNj | 
=) E[N;(0)] 
j 
= > aj Apt 
j 


= At ELY1] 
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Also, 
Var[X(t)] = Var] aN] 
j 


= = a Var[N;(t)] by the independence of the N;(t), j > 1 


y| 

= we a? Apjt 
j 

= ME[Y{] 


where the next to last equality follows since the variance of the Poisson random 
variable N;(¢) is equal to its mean. 

Thus, we see that the representation (5.26) results in the same expressions for 
the mean and variance of X(f) as were previously derived. 

One of the uses of the representation (5.26) is that it enables us to conclude 
that as t grows large, the distribution of X(t) converges to the normal distribu- 
tion. To see why, note first that it follows by the central limit theorem that the 
distribution of a Poisson random variable converges to a normal distribution as 
its mean increases. (Why is this?) Therefore, each of the random variables N;(t) 
converges to a normal random variable as ¢t increases. Because they are inde- 
pendent, and because the sum of independent normal random variables is also 
normal, it follows that X(¢) also approaches a normal distribution as ¢ increases. 


Example 5.28 In Example 5.26, find the approximate probability that at least 
240 people migrate to the area within the next 50 weeks. 


Solution: Since ’ = 2, E[Y;] = 5/2, E[Y?] = 43/6, we see that 
E[X(50)] = 250, — Var[X(50)] = 4300/6 
Now, the desired probability is 
P{X(50) > 240} = P{X(50) > 239.5} 
_ p | X(50) = 250 , 239.5 — =| 


7 4300/6 ~ ,/4300/6 
= 1-¢4(-0.3922) 

= $(0.3922) 

= 0.6525 


where Table 2.3 was used to determine $(0.3922), the probability that a 
standard normal is less than 0.3922. | 
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Another useful result is that if {X(¢),t > 0} and {Y(t),t > 0} are independent 
compound Poisson processes with respective Poisson parameters and distribu- 
tions A441, Fy and Ao, Fo, then {X(t) + Y(t),t > 0} is also a compound Poisson 
process. This is true because in this combined process events will occur accord- 
ing to a Poisson process with rate Ay + A2, and each event independently will be 
from the first compound Poisson process with probability 41/(A1 + 2). Conse- 
quently, the combined process will be a compound Poisson process with Poisson 
parameter 41 + Az, and with distribution function F given by 


M 2 
F(x) = ———F, (x) + ——— F(x 
(x) ere es 1(x) Mt ho 2(x) 


5.4.3 Conditional or Mixed Poisson Processes 


Let {N(¢), t > 0} be a counting process whose probabilities are defined as follows. 
There is a positive random variable L such that, conditional on L=A, the counting 
process is a Poisson process with rate 4. Such a counting process is called a 
conditional or a mixed Poisson process. 

Suppose that L is continuous with density function g. Because 


P{N( +s) —N(s) =n} = [re+9-No =n|L=A}g(A) da 
0 


= if eu OO" 8a) da (5.27) 


n! 


we see that a conditional Poisson process has stationary increments. However, 
because knowing how many events occur in an interval gives information about 
the possible value of L, which affects the distribution of the number of events in 
any other interval, it follows that a conditional Poisson process does not generally 
have independent increments. Consequently, a conditional Poisson process is not 
generally a Poisson process. 


Example 5.29 If g is the gamma density with parameters m and 0, 


g(a) = ge 0 Or A>0 
then 
PING) = m= [eT gee oor dh 
= oa i e-t+DAyntm—1 gy 
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Multiplying and dividing by aaa gives 


t"6" (n+ m— 1)! 13 (t + den t+ (Cz 9 ae 


Because (t + 0)e~ @+ ((¢ + @)A)"+"—! /(n + m — 1)! is the density function of a 
gamma (n + m,t + 6) random variable, its integral is 1, giving the result 


pen =m= ("PN (2) (5) 


Therefore, the number of events in an interval of length t has the same distribution 
of the number of failures that occur before a total of m successes are amassed, 
when each trial is a success with probability 4. a 


To compute the mean and variance of N(t), condition on L. Because, condi- 
tional on L, N(¢) is Poisson with mean Lt, we obtain 


EIN(@)|L] = 
Var(N(t)|L) = 


where the final equality used that the variance of a Poisson random variable is 
equal to its mean. Consequently, the conditional variance formula yields 


Var(N(t)) = E[LLt] + Var(Lt) 
= tE{L] + t’Var(L) 


We can compute the conditional distribution function of L, given that 
N(t) = 2, as follows. 


P{L <x, N(t) =n} 


P{L<x|IN@) =n} = 


P{N(t) =n} 
_ fe’ Pi <x,N@ = nL = Alga) da 
7 P{N(t) =n} 
_ fg PIN@ = nlL = AjgQ) da 
= P{N@) =n} 


_ Soe“ Aty" ga) da 
She e~*(At)"g(A) da 
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where the final equality used Equation (5.27). In other words, the conditional 
density function of L given that N(t) = 7 is 


ert yn g(a) 
fo. eva” g(a) dar’ 


fina (A | n) = (5.28) 


Example 5.30 An insurance company feels that each of its policyholders has a 
rating value and that a policyholder having rating value 4 will make claims at 
times distributed according to a Poisson process with rate 4, when time is mea- 
sured in years. The firm also believes that rating values vary from policyholder to 
policyholder, with the probability distribution of the value of a new policyholder 
being uniformly distributed over (0,1). Given that a policyholder has made n 
claims in his or her first t years, what is the conditional distribution of the time 
until the policyholder’s next claim? 


Solution: If T is the time until the next claim, then we want to compute 
P{T>x | N(t) = n}. Conditioning on the policyholder’s rating value gives, 
upon using Equation (5.28), 


P{T>x|N®=n= f Pe >x|L=A, NW) =n}finaA|n) da 
0 


Te Baad a aa 


= fi |_| 
jg ex dK, 


There is a nice formula for the probability that more than 7 events occur in an 
interval of length ¢. In deriving it we will use the identity 


[o) 7 t n 
Ds ps / je a (5.29) 


! 
j=nt+1 hs 


which follows by noting that it equates the probability that the number of events 
by time ¢ of a Poisson process with rate A is greater than 1 with the probability 
that the time of the (7 + 1)st event of this process (which has a gamma (7 + 1, A) 
distribution) is less than t. Interchanging A and t in Equation (5.29) yields the 
equivalent identity 


eee i pe ON" gy (5.30) 
0 


i! n! 
j=nt+1 i 
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Using Equation (5.27) we now have 


PIN() > n}= >> [sana 


j=nt+1 


co j 
=} aH ee BY 0) dr (by interchanging) 
0 


i 
j=nt+l 
see 
= i) te ———_ dxg(a) da (using (5.30)) 
0 JO n} 
oldies = xy? 
= i / g(A) date saa oe (by interchanging) 
0 Jx 


= is Goyte» PO" dx 
0 n! 


Exercises 


1 


The time T required to repair a machine is an exponentially distributed random 

variable with mean ; (hours). 

(a) What is the probability that a repair time exceeds 5 hour? 

(b) What is the probability that a repair takes at least 125 hours given that its 
duration exceeds 12 hours? 


Suppose that you arrive at a single-teller bank to find five other customers in the 
bank, one being served and the other four waiting in line. You join the end of the 
line. If the service times are all exponential with rate jz, what is the expected amount 
of time you will spend in the bank? 


Let X be an exponential random variable. Without any computations, tell which 
one of the following is correct. Explain your answer. 

(a) E[X*|X > 1] = E[(X + 1)7] 

(b) E[X?|X > 1] = E[X7] +1 

(c) E[X?|X > 1] = (1 + ELx)? 

Consider a post office with two clerks. Three people, A, B, and C, enter simultane- 
ously. A and B go directly to the clerks, and C waits until either A or B leaves before 
he begins service. What is the probability that A is still in the post office after the 
other two have left when 

(a) the service time for each clerk is exactly (nonrandom) ten minutes? 

(b) the service times are i with probability 7 i= 1,2, 3? 

(c) the service times are exponential with mean 1/1? 

The lifetime of a radio is exponentially distributed with a mean of ten years. If Jones 
buys a ten-year-old radio, what is the probability that it will be working after an 
additional ten years? 
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6. 


273 


*10. 


11. 


12. 


13. 


14. 


In Example 5.3 if server i serves at an exponential rate A;, i = 1,2, show that 


mM Is ni 


If X1 and X2 are independent nonnegative continuous random variables, show that 


r(t) 


P{X < X2| min(X1, X2) =t} = nO) +r) 


where 1;(¢) is the failure rate function of X;. 


Let X;,i = 1,...,2 be independent continuous random variables, with X; having 
failure rate function 7;(t). Let T be independent of this sequence, and suppose that 
4 P{T = i} = 1. Show that the failure rate function r(t) of Xr is given by 


r(t) = Sori @P{T =i|X >} 


i=1 


Machine 1 is currently working. Machine 2 will be put in use at a time ¢ from 
now. If the lifetime of machine i is exponential with rate 4;, i = 1,2, what is the 
probability that machine 1 is the first machine to fail? 


Let X and Y be independent exponential random variables with respective rates A 
and uw. Let M = min(X, Y). Find 

(a) E[MX|M = X], 

(b) E[MX|M = Y], 

(c) Cov(X,M). 

Let X, Y1,..., Y, be independent exponential random variables; X having rate A, 
and Y; having rate jz. Let Aj be the event that the jth smallest of these 2 + 1 random 
variables is one of the Y;. Find p = P{X > max; Y;}, by using the identity 


p = P(A --- An) = P(A1)P(A2/A) «++ P(An|A1 +++ An—1) 


Verify your answer when 7 = 2 by conditioning on X to obtain p. 


If X;, i = 1,2,3, are independent exponential random variables with rates Aj, 
i= 1,2, 3, find 

(a) P{X1 < X2 < X3}, 

(b) P{X, < X2| max(X1, X2, X3) = X3}, 

(c) Elmax X;|X1 < X2 < X3], 

(d) E[max Xj]. 

Find, in Example 5.10, the expected time until the mth person on line leaves the line 
(either by entering service or departing without service). 


Let X be an exponential random variable with rate A. 
(a) Use the definition of conditional expectation to determine E[X|X < c]. 
(b) Now determine E[X|X < c] by using the following identity: 


ELX] = E[X|X < c]P{X <c} + E[X|X > c]P{X > c} 
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WSs 


16. 


17. 


*18. 


One hundred items are simultaneously put on a life test. Suppose the lifetimes of 
the individual items are independent exponential random variables with mean 200 
hours. The test will end when there have been a total of 5 failures. If T is the time 
at which the test ends, find E[T] and Var(T). 


There are three jobs that need to be processed, with the processing time of job i 
being exponential with rate 4;. There are two processors available, so processing 
on two of the jobs can immediately start, with processing on the final job to start 
when one of the initial ones is finished. 

(a) Let T; denote the time at which the processing of job i is completed. If the 
objective is to minimize E[T,; + T) + T3], which jobs should be initially 
processed if 41 < 2 < "3? 

(b) Let M, called the makespan, be the time until all three jobs have been processed. 
With S equal to the time that there is only a single processor working, show 
that 


3 
2E[M] = E[S] + }> 1/pj 
i=1 


For the rest of this problem, suppose that 44 = 42 = uw, U3 = Ad. Also, let 
P(w) be the probability that the last job to finish is either job 1 or job 2, and 
let P(A) = 1 — P(w) be the probability that the last job to finish is job 3. 

(c) Express E[S] in terms of P(jz) and P(A). 
Let Pj,;(w) be the value of P(j) when i and j are the jobs that are initially 
started. 

(d) Show that Py 2(u) < P13(u). 

(e) If > A show that E[M] is minimized when job 3 is one of the jobs that is 
initially started. 

(f) If < A show that E[M] is minimized when processing is initially started on 
jobs 1 and 2. 


A set of 1 cities is to be connected via communication links. The cost to construct a 
link between cities i and j is Cj, i € j. Enough links should be constructed so that 
for each pair of cities there is a path of links that connects them. As a result, only 
n—1 links need be constructed. A minimal cost algorithm for solving this problem 
(known as the minimal spanning tree problem) first constructs the cheapest of all the 
(5) links. Then, at each additional stage it chooses the cheapest link that connects 
a city without any links to one with links. That is, if the first link is between cities 
1 and 2, then the second link will either be between 1 and one of the links 3,...,7 
or between 2 and one of the links 3,...,. Suppose that all of the (5) costs Cj; are 
independent exponential random variables with mean 1. Find the expected cost of 
the preceding algorithm if 

(a) n=3, 

(b) n=4. 

Let X; and X be independent exponential random variables, each having 
rate yw. Let 


Xq) = minimum(X;,X2) and X) = maximum(X, X2) 
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19. 


20. 


21. 


22. 


#23, 


Find 

(a) ElXqy], 
(b) Var[X cy], 
(c) E[Xqy], 
(d) Var[X(2)]. 


Repeat Exercise 18, but this time suppose that the X; are independent exponentials 
with respective rates j4;,i = 1,2. 


Consider a two-server system in which a customer is served first by server 1, then 
by server 2, and then departs. The service times at server i are exponential random 
variables with rates j,i = 1,2. When you arrive, you find server 1 free and two 
customers at server 2—customer A in service and customer B waiting in line. 

(a) Find Pg, the probability that A is still in service when you move over to server 2. 
(b) Find Pg, the probability that B is still in the system when you move over to 

server 2. 
(c) Find E[T], where T is the time that you spend in the system. 


Hint: Write 
T = 8; + Sp + Wa + Wp 


where S; is your service time at server i, W, is the amount of time you wait in queue 
while A is being served, and Wz is the amount of time you wait in queue while B 
is being served. 


In a certain system, a customer must first be served by server 1 and then by server 2. 
The service times at server i are exponential with rate jj, i = 1,2. An arrival finding 
server 1 busy waits in line for that server. Upon completion of service at server 1, 
a customer either enters service with server 2 if that server is free or else remains 
with server 1 (blocking any other customer from entering service) until server 2 is 
free. Customers depart the system after being served by server 2. Suppose that when 
you arrive there is one customer in the system and that customer is being served by 
server 1. What is the expected total time you spend in the system? 


Suppose in Exercise 21 you arrive to find two others in the system, one being served 
by server 1 and one by server 2. What is the expected time you spend in the system? 
Recall that if server 1 finishes before server 2, then server 1’s customer will remain 
with him (thus blocking your entrance) until server 2 becomes free. 


A flashlight needs two batteries to be operational. Consider such a flashlight along 
with a set of m functional batteries—battery 1, battery 2,..., battery 7. Initially, 
battery 1 and 2 are installed. Whenever a battery fails, it is immediately replaced by 
the lowest numbered functional battery that has not yet been put in use. Suppose 
that the lifetimes of the different batteries are independent exponential random 
variables each having rate ~. At a random time, call it T, a battery will fail and 
our stockpile will be empty. At that moment exactly one of the batteries—which 
we call battery X—will not yet have failed. 

(a) What is P{X = n}? 

(b) What is P{X = 1}? 

(c) What is P{X = i}? 
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24. 


25. 


26. 


27. 


28. 


29. 


(d) Find E[T]. 
(e) What is the distribution of T? 


There are two servers available to process 7 jobs. Initially, each server begins work 
ona job. Whenever a server completes work on a job, that job leaves the system and 
the server begins processing a new job (provided there are still jobs waiting to be 
processed). Let T denote the time until all jobs have been processed. If the time that 
it takes server i to process a job is exponentially distributed with rate 4;, i = 1,2, 
find E[T] and Var(T). 


Customers can be served by any of three servers, where the service times of server 7 

are exponentially distributed with rate 4;, i = 1,2,3. Whenever a server becomes 

free, the customer who has been waiting the longest begins service with that server. 

(a) Ifyou arrive to find all three servers busy and no one waiting, find the expected 
time until you depart the system. 

(b) If you arrive to find all three servers busy and one person waiting, find the 
expected time until you depart the system. 


Each entering customer must be served first by server 1, then by server 2, and finally 

by server 3. The amount of time it takes to be served by server i is an exponential 

random variable with rate 4;, i = 1,2,3. Suppose you enter the system when it 

contains a single customer who is being served by server 3. 

(a) Find the probability that server 3 will still be busy when you move over to 

server 2. 

(b) Find the probability that server 3 will still be busy when you move over to 

server 3. 

(c) Find the expected amount of time that you spend in the system. (Whenever 

you encounter a busy server, you must wait for the service in progress to end 

before you can enter service.) 

(d) Suppose that you enter the system when it contains a single customer who is 
being served by server 2. Find the expected amount of time that you spend in 
the system. 


Show, in Example 5.7, that the distributions of the total cost are the same for the 
two algorithms. 


Consider 7 components with independent lifetimes, which are such that component 
i functions for an exponential time with rate A;. Suppose that all components are 
initially in use and remain so until they fail. 

(a) Find the probability that component 1 is the second component to fail. 

(b) Find the expected time of the second failure. 


Hint: Do not make use of part (a). 


Let X and Y be independent exponential random variables with respective rates A 
and «1, where A > pw. Let c > 0. 
(a) Show that the conditional density function of X, given that X + Y =c, is 


Oe pe Cs 
1— ¢-G-we 


fxix+y (xlo) = ; O<x<c 


(b) Use part (a) to find E[X|X + Y = c]. 
(c) Find E[Y|X + Y =c]. 
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30. 


31. 


32. 


33. 


34. 


35. 
*36. 


The lifetimes of A’s dog and cat are independent exponential random variables with 
respective rates Ag and A,. One of them has just died. Find the expected additional 
lifetime of the other pet. 


A doctor has scheduled two appointments, one at 1 P.M. and the other at 1:30 P.M. 
The amounts of time that appointments last are independent exponential random 
variables with mean 30 minutes. Assuming that both patients are on time, find 
the expected amount of time that the 1:30 appointment spends at the doctor’s 


office. 


Let X be a uniform random variable on (0, 1), and consider a counting process 
where events occur at times X + i, fori = 0,1,2,.... 

(a) Does this counting process have independent increments? 

(b) Does this counting process have stationary increments? 


Let X and Y be independent exponential random variables with respective rates A 
and p. 

(a) Argue that, conditional on X > Y, the random variables min(X, Y) and X — Y 
are independent. 

(b) Use part (a) to conclude that for any positive constant c 


E{min(X, Y)|X > Y + c] = E[min(X, Y)|X > Y] 


(c) Give a verbal explanation of why min(X, Y) and X — Y are (unconditionally) 
independent. 


Two individuals, A and B, both require kidney transplants. If she does not receive 
a new kidney, then A will die after an exponential time with rate 4, and B after 
an exponential time with rate jug. New kidneys arrive in accordance with a Poisson 
process having rate 4. It has been decided that the first kidney will go to A (or to B 
if B is alive and A is not at that time) and the next one to B (if still living). 

(a) What is the probability that A obtains a new kidney? 

(b) What is the probability that B obtains a new kidney? 


Show that Definition 5.1 of a Poisson process implies Definition 5.3. 


Let S(¢) denote the price of a security at time ¢. A popular model for the process 
{S(£),t > 0} supposes that the price remains unchanged until a “shock” occurs, at 
which time the price is multiplied by a random factor. If we let N(t) denote the 
number of shocks by time t, and let X; denote the ith multiplicative factor, then 
this model supposes that 


N(t) 
S@) = SO) [| Xi 
i=1 
where They X; is equal to 1 when N(t) = 0. Suppose that the X; are independent 
exponential random variables with rate jz; that {N(t),¢ > 0} is a Poisson process 
with rate A; that {N(t),t > 0} is independent of the X;; and that $(0) = s. 
(a) Find E[S(z)]. 
(b) Find E[S*(2)]. 
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37. Amachine works for an exponentially distributed time with rate jz and then fails. A 
repair crew checks the machine at times distributed according to a Poisson process 
with rate A; if the machine is found to have failed then it is immediately replaced. 

Find the expected time between replacements of machines. 

38. Let {Mj(t),¢ > 0}, i = 1,2, 3 be independent Poisson processes with respective rates 

Aj, i= 1,2, and set 
Nit) = Mi(t) + Mo@), = Na(t) = M2(t) + M3) 
The stochastic process {(N1(t), N2(t)), ¢ > 0} is called a bivariate Poisson process. 
(a) Find P{N,(t) =n, No(t) = m}. 
(b) Find Cov(Nj (4), N2(¢)). 

39. A certain scientific theory supposes that mistakes in cell division occur according 
to a Poisson process with rate 2.5 per year, and that an individual dies when 196 
such mistakes have occurred. Assuming this theory, find 
(a) the mean lifetime of an individual, 

(b) the variance of the lifetime of an individual. 

Also approximate 

(c) the probability that an individual dies before age 67.2, 
(d) the probability that an individual reaches age 90, 

(e) the probability that an individual reaches age 100. 

*40. Show that if {N;(£),t > 0} are independent Poisson processes with rate A;, i = 1,2, 
then {N(¢),¢ > 0} is a Poisson process with rate Ay + A2 where N(t) = Ni(f) + 
N2(t). 

41. In Exercise 40 what is the probability that the first event of the combined process 
is from the Ny process? 

42. Let {N(t), t > 0} be a Poisson process with rate A. Let S,, denote the time of the 
nth event. Find 
(a) E[Sa], 

(b) E[S4|NQ) = 2], 
(c) E[N(4) — N@)|N(1) = 3]. 

43. Customers arrive at a two-server service station according to a Poisson process with 
rate A. Whenever a new customer arrives, any customer that is in the system imme- 
diately departs. A new arrival enters service first with server 1 and then with server 
2. If the service times at the servers are independent exponentials with respective 
rates 441 and 2, what proportion of entering customers completes their service 
with server 2? 

44. Cars pass a certain street location according to a Poisson process with rate A. 
A woman who wants to cross the street at that location waits until she can see 
that no cars will come by in the next T time units. 

(a) Find the probability that her waiting time is 0. 
(b) Find her expected waiting time. 
Hint: Condition on the time of the first car. 
45. Let {N(t), t > 0} be a Poisson process with rate A that is independent of the non- 


negative random variable T with mean yz and variance o~. Find 
(a) Cov(T, N(T)), 
(b) Var(N(T)). 
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46. 


47. 


48. 


49. 


50. 


Si: 


Let {N(¢), t > 0} bea Poisson process with rate A that is independent of the sequence 
X 1, X2,... of independent and identically distributed random variables with mean 
pe and variance o2. Find 


N(t) 


Cov{| N(t), ee Xj 
i=1 


Consider a two-server parallel queuing system where customers arrive according to 

a Poisson process with rate A, and where the service times are exponential with rate 

ut. Moreover, suppose that arrivals finding both servers busy immediately depart 

without receiving any service (such a customer is said to be lost), whereas those 

finding at least one free server immediately enter service and then depart when 

their service is completed. 

(a) Ifboth servers are presently busy, find the expected time until the next customer 
enters the system. 

(b) Starting empty, find the expected time until both servers are busy. 

(c) Find the expected time between two successive lost customers. 


Consider an n-server parallel queuing system where customers arrive according 
to a Poisson process with rate A, where the service times are exponential random 
variables with rate 4, and where any arrival finding all servers busy immediately 
departs without receiving any service. If an arrival finds all servers busy, find 

(a) the expected number of busy servers found by the next arrival, 

(b) the probability that the next arrival finds all servers free, 

(c) the probability that the next arrival finds exactly i of the servers free. 


Events occur according to a Poisson process with rate 4. Each time an event occurs, 
we must decide whether or not to stop, with our objective being to stop at the last 
event to occur prior to some specified time T, where T > 1/A. That is, if an event 
occurs at time t, 0 < t < T, and we decide to stop, then we win if there are no 
additional events by time T, and we lose otherwise. If we do not stop when an event 
occurs and no additional events occur by time T, then we lose. Also, if no events 
occur by time T, then we lose. Consider the strategy that stops at the first event to 
occur after some fixed time s, 0 <s < T. 

(a) Using this strategy, what is the probability of winning? 

(b) What value of s maximizes the probability of winning? 

(c) Show that one’s probability of winning when using the preceding strategy with 

the value of s specified in part (b) is 1/e. 


The number of hours between successive train arrivals at the station is uniformly 
distributed on (0,1). Passengers arrive according to a Poisson process with rate 7 
per hour. Suppose a train has just left the station. Let X denote the number of 
people who get on the next train. Find 

(a) E[X], 

(b) Var(X). 

If an individual has never had a previous automobile accident, then the probability 
he or she has an accident in the next h time units is Bh + o(h); on the other hand, if 
he or she has ever had a previous accident, then the probability is wh + o(h). Find 
the expected number of accidents an individual has by time t. 
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Teams 1 and 2 are playing a match. The teams score points according to independent 
Poisson processes with respective rates A and A. If the match ends when one of the 
teams has scored k more points than the other, find the probability that team 1 wins. 


Hint: Relate this to the gambler’s ruin problem. 


The water level of a certain reservoir is depleted at a constant rate of 1000 units 
daily. The reservoir is refilled by randomly occurring rainfalls. Rainfalls occur 
according to a Poisson process with rate 0.2 per day. The amount of water added 
to the reservoir by a rainfall is 5000 units with probability 0.8 or 8000 units with 
probability 0.2. The present water level is just slightly below 5000 units. 

(a) What is the probability the reservoir will be empty after five days? 

(b) What is the probability the reservoir will be empty sometime within the next 

ten days? 


A viral linear DNA molecule of length, say, 1 is often known to contain a certain 
“marked position,” with the exact location of this mark being unknown. One 
approach to locating the marked position is to cut the molecule by agents that 
break it at points chosen according to a Poisson process with rate A. It is then 
possible to determine the fragment that contains the marked position. For instance, 
letting m denote the location on the line of the marked position, then if L denotes 
the last Poisson event time before m (or 0 if there are no Poisson events in [0, 77]), 
and R, denotes the first Poisson event time after 7 (or 1 if there are no Poisson 
events in [7,1]), then it would be learned that the marked position lies between 
L, and Rj. Find 

(a) P{Li = 0}, 

(b) P{Ly <x}, 0<x<™m, 

(c) P{Ri = 1}, 

(d) P{R, >x}, m<x <1. 

By repeating the preceding process on identical copies of the DNA molecule, we are 
able to zero in on the location of the marked position. If the cutting procedure is 
utilized on 7 identical copies of the molecule, yielding the data Lj, Rj, i= 1,...,”, 
then it follows that the marked position lies between L and R, where 


L=maxl;, R=mink; 
1 1 


(e) Find E[R — L], and in doing so, show that E[R — L] ~~ 3. 


Consider a single server queuing system where customers arrive according to 
a Poisson process with rate A, service times are exponential with rate w, and 
customers are served in the order of their arrival. Suppose that a customer arrives 
and finds n — 1 others in the system. Let X denote the number in the system at the 
moment that customer departs. Find the probability mass function of X. 


Hint: Relate this to a negative binomial random variable. 


An event independently occurs on each day with probability p. Let N() denote 
the total number of events that occur on the first 7 days, and let T, denote the day 
on which the rth event occurs. 

(a) What is the distribution of N(n)? 

(b) What is the distribution of T,? 
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(c) What is the distribution of T,? 

(d) Given that N(z) = r, show that the set of r days on which events occurred 
has the same distribution as a random selection (without replacement) of r of 
the values 1,2,...,7. 


Events occur according to a Poisson process with rate A = 2 per hour. 

(a) What is the probability that no event occurs between 8 P.M. and 9 P.M.? 

(b) Starting at noon, what is the expected time at which the fourth event occurs? 

(c) What is the probability that two or more events occur between 6 P.M. and 
8 P.M.? 


Consider the coupon collecting problem where there are m distinct types of 
coupons, and each new coupon collected is type j with probability p;, )77") pj = 1. 
Suppose you stop collecting when you have a complete set of at least one of each 
type. Show that 


P{i is the last type collected} = el[]a — usr) 
f#i 
where U is a uniform random variable on (0,1). 


There are two types of claims that are made to an insurance company. Let N;(t) 
denote the number of type i claims made by time t, and suppose that {N1(t),¢ > 0} 
and {N(t),t > 0} are independent Poisson processes with rates A; = 10 and 
42 = 1. The amounts of successive type 1 claims are independent exponential 
random variables with mean $1000 whereas the amounts from type 2 claims are 
independent exponential random variables with mean $5000. A claim for $4000 
has just been received; what is the probability it is a type 1 claim? 


Customers arrive at a bank at a Poisson rate 4. Suppose two customers arrived 
during the first hour. What is the probability that 

(a) both arrived during the first 20 minutes? 

(b) at least one arrived during the first 20 minutes? 


A system has a random number of flaws that we will suppose is Poisson distributed 

with mean c. Each of these flaws will, independently, cause the system to fail at 

a random time having distribution G. When a system failure occurs, suppose that 

the flaw causing the failure is immediately located and fixed. 

(a) What is the distribution of the number of failures by time t? 

(b) What is the distribution of the number of flaws that remain in the system at 
time ¢? 

(c) Are the random variables in parts (a) and (b) dependent or independent? 


Suppose that the number of typographical errors in a new text is Poisson distributed 
with mean A. Two proofreaders independently read the text. Suppose that each 
error is independently found by proofreader i with probability p;, i = 1,2. Let 
Xj, denote the number of errors that are found by proofreader 1 but not by 
proofreader 2. Let X2 denote the number of errors that are found by proofreader 
2 but not by proofreader 1. Let X3 denote the number of errors that are found by 
both proofreaders. Finally, let X4 denote the number of errors found by neither 
proofreader. 
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(a) Describe the joint probability distribution of X1, X2, X3, X4. 
(b) Show that 


E[X,] = 1— p2 a E[X2] 3 1-p 
E[X3] p2 E[X3] pi 


Suppose now that A, ~1, and p2 are all unknown. 

(c) By using X; as an estimator of E[X;], i = 1,2, 3, present estimators of p1, p2, 
and A. 

(d) Give an estimator of X4, the number of errors not found by either proofreader. 


Consider an infinite server queuing system in which customers arrive in accordance 
with a Poisson process with rate A, and where the service distribution is exponential 
with rate jz. Let X(t) denote the number of customers in the system at time f. Find 
(a) ELX(@+4+s5)|X(s) = 7); 

(b) Var[X(t + s)|X(s) = 7]. 


Hint: Divide the customers in the system at time ¢ + s into two groups, one 
consisting of “old” customers and the other of “new” customers. 


(c) Consider an infinite server queuing system in which customers arrive 
according to a Poisson process with rate A, and where the service times are 
all exponential random variables with rate w. If there is currently a single 
customer in the system, find the probability that the system becomes empty 
when that customer departs. 


Suppose that people arrive at a bus stop in accordance with a Poisson process with 
rate A. The bus departs at time ¢. Let X denote the total amount of waiting time 
of all those who get on the bus at time t. We want to determine Var(X). Let N(t) 
denote the number of arrivals by time f. 

(a) What is E-X|N()]? 

(b) Argue that Var[X|N(t)] = N(¢)t?/12. 

(c) What is Var(X)? 


An average of 500 people pass the California bar exam each year. A California 
lawyer practices law, on average, for 30 years. Assuming these numbers remain 
steady, how many lawyers would you expect California to have in 2050? 


Policyholders of a certain insurance company have accidents at times distributed 

according to a Poisson process with rate 4. The amount of time from when the 

accident occurs until a claim is made has distribution G. 

(a) Find the probability there are exactly m incurred but as yet unreported claims 
at time f. 

(b) Suppose that each claim amount has distribution F, and that the claim amount 
is independent of the time that it takes to report the claim. Find the expected 
value of the sum of all incurred but as yet unreported claims at time f. 


Satellites are launched into space at times distributed according to a Poisson 
process with rate 4. Each satellite independently spends a random time (having 
distribution G) in space before falling to the ground. Find the probability that 
none of the satellites in the air at time t was launched before time s, where s < tf. 


Suppose that electrical shocks having random amplitudes occur at times dis- 
tributed according to a Poisson process {N(f), ¢ > 0} with rate 4. Suppose that the 
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amplitudes of the successive shocks are independent both of other amplitudes and 
of the arrival times of shocks, and also that the amplitudes have distribution F 
with mean j. Suppose also that the amplitude of a shock decreases with time at 
an exponential rate w, meaning that an initial amplitude A will have value Ae~* 
after an additional time x has elapsed. Let A(t) denote the sum of all amplitudes 
at time t. That is, 


NG) 
AG= > Aer, 
i=1 


where A; and S; are the initial amplitude and the arrival time of shock i. 

(a) Find E[A(é)] by conditioning on N(Z). 

(b) Without any computations, explain why A(t) has the same distribution as 
does D(t) of Example 5.21. 


Let {N(£), t > 0} be a Poisson process with rate A. For s < t, find 

(a) P(N(@) > N(s)); 

(b) P(N(s) = 0,N@) = 3); 

(c) E[N(@|N(s) = 4); 

(d) E[N(s)|N@ = 4]. 

For the infinite server queue with Poisson arrivals and general service distribution 
G, find the probability that 

(a) the first customer to arrive is also the first to depart. 

Let S(t) equal the sum of the remaining service times of all customers in the system 
at time f. 

(b) Argue that S(¢) is a compound Poisson random variable. 

(c) Find E[S(z)]. 

(d) Find Var(S()). 


Let S,, denote the time of the mth event of the Poisson process {N(t), t > 0} having 


rate A. Show, for an arbitrary function g, that the random variable yey g(S;) 


has the same distribution as the compound Poisson random variable ae g(Uj), 
where Uj, U2,... is a sequence of independent and identically distributed uniform 
(0, t) random variables that is independent of N, a Poisson random variable with 
mean At. Consequently, conclude that 


N(t) t N(t) t 
El! >> g(S) | =a i} gix)dx Var| D° 9S) ] =A i. g(x) dx 
i=1 0 i=1 0 


A cable car starts off with 7 riders. The times between successive stops of the car 
are independent exponential random variables with rate A. At each stop one rider 
gets off. This takes no time, and no additional riders get on. After a rider gets 
off the car, he or she walks home. Independently of all else, the walk takes an 
exponential time with rate jp. 

(a) What is the distribution of the time at which the last rider departs the car? 
(b) Suppose the last rider departs the car at time t. What is the probability that 

all the other riders are home at that time? 
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Shocks occur according to a Poisson process with rate 4, and each shock indepen- 

dently causes a certain system to fail with probability p. Let T denote the time at 

which the system fails and let N denote the number of shocks that it takes. 

(a) Find the conditional distribution of T given that N = n. 

(b) Calculate the conditional distribution of N, given that T = t, and notice that 
it is distributed as 1 plus a Poisson random variable with mean A(1 — p)t. 

(c) Explain how the result in part (b) could have been obtained without any 
calculations. 


The number of missing items in a certain location, call it X, is a Poisson random 

variable with mean 4. When searching the location, each item will independently 

be found after an exponentially distributed time with rate w. A reward of R is 

received for each item found, and a searching cost of C per unit of search time is 

incurred. Suppose that you search for a fixed time ¢ and then stop. 

(a) Find your total expected return. 

(b) Find the value of ¢ that maximizes the total expected return. 

(c) The policy of searching for a fixed time is a static policy. Would a dynamic 
policy, which allows the decision as to whether to stop at each time tf, depend 
on the number already found by t be beneficial? 


Hint: How does the distribution of the number of items not yet found by time ¢ 
depend on the number already found by that time? 


Suppose that the times between successive arrivals of customers at a single-server 
station are independent random variables having a common distribution F. 
Suppose that when a customer arrives, he or she either immediately enters service 
if the server is free or else joins the end of the waiting line if the server is busy 
with another customer. When the server completes work on a customer, that 
customer leaves the system and the next waiting customer, if there are any, enters 
service. Let X,, denote the number of customers in the system immediately before 
the nth arrival, and let Y, denote the number of customers that remain in the 
system when the nth customer departs. The successive service times of customers 
are independent random variables (which are also independent of the interarrival 
times) having a common distribution G. 
(a) If F is the exponential distribution with rate 4, which, if any, of the processes 
{Xn}, {Yn} is a Markov chain? 
(b) If G is the exponential distribution with rate 4, which, if any, of the processes 
{Xn}, {Yn} is a Markov chain? 
(c) Give the transition probabilities of any Markov chains in parts (a) and (b). 


For the model of Example 5.27, find the mean and variance of the number of 
customers served in a busy period. 


Suppose that customers arrive to a system according to a Poisson process with 
rate 4. There are an infinite number of servers in this system so a customer 
begins service upon arrival. The service times of the arrivals are independent 
exponential random variables with rate 4, and are independent of the arrival 
process. Customers depart the system when their service ends. Let N be the number 
of arrivals before the first departure. 

(a) Find P(N = 1). 

(b) Find P(N = 2). 
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(c) Find P(N =/). 
(d) Find the probability that the first to arrive is the first to depart. 
(ec) Find the expected time of the first departure. 


A store opens at 8 A.M. From 8 until 10 A.M. customers arrive at a Poisson rate of 
four an hour. Between 10 A.M. and 12 P.M. they arrive at a Poisson rate of eight an 
hour. From 12 P.M. to 2 P.M. the arrival rate increases steadily from eight per hour at 
12 P.M. to ten per hour at 2 P.M.; and from 2 to 5 P.M. the arrival rate drops steadily 
from ten per hour at 2 P.M. to four per hour at 5 p.M.. Determine the probability 
distribution of the number of customers that enter the store on a given day. 


Consider a nonhomogeneous Poisson process whose intensity function A(t) is 
bounded and continuous. Show that such a process is equivalent to a process of 
counted events from a (homogeneous) Poisson process having rate 4, where an 
event at time f is counted (independent of the past) with probability A()/A; and 
where A is chosen so that A(s) < A for all s. 


Let Ty, Ty, ... denote the interarrival times of events of a nonhomogeneous Poisson 
process having intensity function A(t). 

(a) Are the T; independent? 

(b) Are the T; identically distributed? 

(c) Find the distribution of T,. 
(a) 


Let {N(¢),t > 0} be a nonhomogeneous Poisson process with mean value 
function m(t). Given N(t) = 1, show that the unordered set of arrival times 
has the same distribution as 7 independent and identically distributed random 
variables having distribution function 


a 


m(x) ; 
Fx)=)mp’ ~*~ 
I; Set 


(b) Suppose that workmen incur accidents in accordance with a nonhomogeneous 
Poisson process with mean value function m/(t). Suppose further that each 
injured man is out of work for a random amount of time having distribution 
F, Let X(t) be the number of workers who are out of work at time f. By using 
part (a), find E[X(#)]. 

Let X1,X2,... be independent positive continuous random variables with a 

common density function f, and suppose this sequence is independent of N, a 

Poisson random variable with mean i. Define 


N() = number of i << N: X; <t 
Show that {N(z),¢ > 0} is a nonhomogeneous Poisson process with intensity 


function A(t) = Af (2). 


Suppose that {No(t), tf > 0} is a Poisson process with rate A = 1. Let A(t) denote 
a nonnegative function of t, and let 


t 
mit) = [ X(s) ds 
0 
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Define N(¢) by 
N(t) = No(m(2)) 


Argue that {N(¢),¢ > 0} is a nonhomogeneous Poisson process with intensity 
function A(t),t > 0. 


Hint: Make use of the identity 
m(t + h) — m(t) = m'(t)h + o(h) 


Let X1, X2,... be independent and identically distributed nonnegative continuous 
random variables having density function f(x). We say that a record occurs at time 
n if X,, is larger than each of the previous values X1,...,Xy—1. (A record auto- 
matically occurs at time 1.) If a record occurs at time n, then X, is called a record 
value. In other words, a record occurs whenever a new high is reached, and that 
new high is called the record value. Let N(t) denote the number of record values 
that are less than or equal to ¢. Characterize the process {N(t), t > 0} when 

(a) f is an arbitrary continuous density function. 


(b) f(x) = ae, 


Hint: Finish the following sentence: There will be a record whose value is between 
tandt + dt if the first X; that is greater than f lies between ... 


An insurance company pays out claims on its life insurance policies in accordance 
with a Poisson process having rate A = 5 per week. If the amount of money paid 
on each policy is exponentially distributed with mean $2000, what is the mean and 
variance of the amount of money paid by the insurance company in a four-week 
span? 

In good years, storms occur according to a Poisson process with rate 3 per unit 
time, while in other years they occur according to a Poisson process with rate 5 
per unit time. Suppose next year will be a good year with probability 0.3. Let N(z) 
denote the number of storms during the first ¢ time units of next year. 

(a) Find P{N(t) = 7}. 

Is {N(t)} a Poisson process? 

Does {N(¢)} have stationary increments? Why or why not? 

Does it have independent increments? Why or why not? 

If next year starts off with three storms by time ¢ = 1, what is the conditional 
probability it is a good year? 


(b) 
(c) 
(d) 
(e) 


e 


Determine 
Cov[X(£), X(¢ + s)] 


when {X(t),¢ > 0} is a compound Poisson process. 


Customers arrive at the automatic teller machine in accordance with a Poisson pro- 
cess with rate 12 per hour. The amount of money withdrawn on each transaction 
is a random variable with mean $30 and standard deviation $50. (A negative with- 
drawal means that money was deposited.) The machine is in use for 15 hours daily. 
Approximate the probability that the total daily withdrawal is less than $6000. 
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Some components of a two-component system fail after receiving a shock. Shocks 
of three types arrive independently and in accordance with Poisson processes. 
Shocks of the first type arrive at a Poisson rate 41 and cause the first component 
to fail. Those of the second type arrive at a Poisson rate 42 and cause the second 
component to fail. The third type of shock arrives at a Poisson rate A3 and causes 
both components to fail. Let X; and Xz denote the survival times for the two 
components. Show that the joint distribution of X; and X2 is given by 


P{X1 > s, X1 > t} = exp{—Ayzs — Agt — A3 max(s, t)} 
This distribution is known as the bivariate exponential distribution. 


In Exercise 89 show that X; and X2 both have exponential distributions. 


Let X1, X2,..., X, be independent and identically distributed exponential random 
variables. Show that the probability that the largest of them is greater than the 
sum of the others is 7/2”~!. That is, if 


M = max X; 
j 


then show 


n 
P {a> Sxi-ml = i 


i=1 


Hint: What is P(X, > )“_5 Xj}? 
Prove Equation (5.22). 


Prove that 
(a) max(X 1, X2) = X1 + X2 — min(X1, X2) and, in general, 
n 


(b) max(Xj,...,Xn) = }> X;— }> >> min(X;, X)) 
1 i<j 
+ 5°97 S° min(X, Xj, X,) + 
i<j<k 


+ (=1)*"! min(X;, Xj,...,Xn) 


(c) Show by defining appropriate random variables X;, i = 1,...,”, and by 
taking expectations in part (b) how to obtain the well-known formula 


(U a) = 0 P(A) — 0 PA) ++ + CDT PAL + An) 
1 i 


i<j 


(d) Consider 7 independent Poisson processes—the ith having rate 4;. Derive an 
expression for the expected time until an event has occurred in all 7 processes. 


A two-dimensional Poisson process is a process of randomly occurring events in 

the plane such that 

(i) for any region of area A the number of events in that region has a Poisson 
distribution with mean 1A, and 

(ii) the number of events in nonoverlapping regions are independent. 
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For such a process, consider an arbitrary point in the plane and let X denote its 
distance from its nearest event (where distance is measured in the usual Euclidean 
manner). Show that 
(a) P{X>t=e%, 


(b) E[X] = xp. 


95. Let {N(t),¢ > 0} be a conditional Poisson process with a random rate L. 
(a) Derive an expression for E[L|N(t) = 7]. 
(b) Find, for s > t, E[N(s)|N(t) = 7]. 
(c) Find, for s < t, E[N(s)|N(t) = 7]. 
96. For the conditional Poisson process, let m1 = E[L], m2 = E[L?]. In terms of 14 
and my, find Cov(N(s), N(t)) for s < t. 


97. Consider a conditional Poisson process in which the rate L is, as in Example 5.29, 
gamma distributed with parameters m and p. Find the conditional density function 
of L given that N(t) =n. 
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6.1 Introduction 


In this chapter we consider a class of probability models that has a wide variety of 
applications in the real world. The members of this class are the continuous-time 
analogs of the Markov chains of Chapter 4 and as such are characterized by the 
Markovian property that, given the present state, the future is independent of 
the past. 

One example of a continuous-time Markov chain has already been met. This 
is the Poisson process of Chapter 5. For if we let the total number of arrivals 
by time ¢ (that is, N(t)) be the state of the process at time t, then the Poisson 
process is a continuous-time Markov chain having states 0,1,2,... that always 
proceeds from state n to state 1 + 1, where 1 > 0. Such a process is known 
as a pure birth process since when a transition occurs the state of the system is 
always increased by one. More generally, an exponential model that can go (in 
one transition) only from state 7 to either state 2 — 1 or state n + 1 is called a 
birth and death model. For such a model, transitions from state 1 to state n + 1 
are designated as births, and those from n to m — 1 as deaths. Birth and death 
models have wide applicability in the study of biological systems and in the study 
of waiting line systems in which the state represents the number of customers in 
the system. These models will be studied extensively in this chapter. 
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In Section 6.2 we define continuous-time Markov chains and then relate them 
to the discrete-time Markov chains of Chapter 4. In Section 6.3 we consider 
birth and death processes and in Section 6.4 we derive two sets of differential 
equations—the forward and backward equations—that describe the probability 
laws for the system. The material in Section 6.5 is concerned with determining 
the limiting (or long-run) probabilities connected with a continuous-time Markov 
chain. In Section 6.6 we consider the topic of time reversibility. We show that all 
birth and death processes are time reversible, and then illustrate the importance 
of this observation to queueing systems. In the final section we show how to 
“uniformize” Markov chains, a technique useful for numerical computations. 


6.2 Continuous-Time Markov Chains 


Suppose we have a continuous-time stochastic process {X(t), ¢ > 0} taking on 
values in the set of nonnegative integers. In analogy with the definition of a 
discrete-time Markov chain, given in Chapter 4, we say that the process {X(t), t > 
0} is a continuous-time Markov chain if for all s, t > 0 and nonnegative integers 
i,j,x(u),0 <u<s 


P{X(t + s) = j|X(s) =i, X(u) = x(u), 0 <u <s} 
= P{X(t + s) =4|X(s) = 3} 


In other words, a continuous-time Markov chain is a stochastic process having 
the Markovian property that the conditional distribution of the future X(t + s) 
given the present X(s) and the past X(u), 0 < u < s, depends only on the present 
and is independent of the past. If, in addition, 


P{X(t + s) = s|X(s) = 3} 


is independent of s, then the continuous-time Markov chain is said to have sta- 
tionary or homogeneous transition probabilities. 

All Markov chains considered in this text will be assumed to have stationary 
transition probabilities. 

Suppose that a continuous-time Markov chain enters state i at some time, say, 
time 0, and suppose that the process does not leave state i (that is, a transition does 
not occur) during the next ten minutes. What is the probability that the process 
will not leave state i during the following five minutes? Since the process is in 
state i at time 10 it follows, by the Markovian property, that the probability 
that it remains in that state during the interval [10, 15] is just the (unconditional) 
probability that it stays in state i for at least five minutes. That is, if we let T; denote 
the amount of time that the process stays in state i before making a transition 
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into a different state, then 
P{T; > 15|T; > 10} = P{T; > 5} 
or, in general, by the same reasoning, 
P{T; >s+t|T; > s} = P{T; > t} 


for all s, t > 0. Hence, the random variable T; is memoryless and must thus (see 
Section 5.2.2) be exponentially distributed. 

In fact, the preceding gives us another way of defining a continuous-time 
Markov chain. Namely, it is a stochastic process having the properties that each 
time it enters state 7 


(i) the amount of time it spends in that state before making a transition into a different 
state is exponentially distributed with mean, say, 1/v;, and 

(ii) when the process leaves state i, it next enters state j with some probability, say, Pj. 
Of course, the P;; must satisfy 


Pi = 0, alli 
P= 1, alli 
j 


In other words, a continuous-time Markov chain is a stochastic process that 
moves from state to state in accordance with a (discrete-time) Markov chain, 
but is such that the amount of time it spends in each state, before proceeding to 
the next state, is exponentially distributed. In addition, the amount of time the 
process spends in state 7, and the next state visited, must be independent random 
variables. For if the next state visited were dependent on Tj, then information 
as to how long the process has already been in state i would be relevant to the 
prediction of the next state—and this contradicts the Markovian assumption. 


Example 6.1 (AShoe Shine Shop) Consider a shoe shine establishment consisting 
of two chairs—chair 1 and chair 2. A customer upon arrival goes initially to chair 
1 where his shoes are cleaned and polish is applied. After this is done the customer 
moves on to chair 2 where the polish is buffed. The service times at the two 
chairs are assumed to be independent random variables that are exponentially 
distributed with respective rates 41 and 422. Suppose that potential customers 
arrive in accordance with a Poisson process having rate A, and that a potential 
customer will enter the system only if both chairs are empty. 

The preceding model can be analyzed as a continuous-time Markov chain, but 
first we must decide upon an appropriate state space. Since a potential customer 
will enter the system only if there are no other customers present, it follows that 
there will always either be 0 or 1 customers in the system. However, if there is 
1 customer in the system, then we would also need to know which chair he was 
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presently in. Hence, an appropriate state space might consist of the three states 
0, 1, and 2 where the states have the following interpretation: 


State Interpretation 
0 system is empty 
1 a customer is in chair 1 
2 a customer is in chair 2 


We leave it as an exercise for you to verify that 


yo=A, MU=1, V2=/2; 
Por =Pi2 = P29 = 1 a 


6.3 Birth and Death Processes 


Consider a system whose state at any time is represented by the number of people 
in the system at that time. Suppose that whenever there are 7 people in the system, 
then (i) new arrivals enter the system at an exponential rate 1,,, and (ii) people 
leave the system at an exponential rate jz. That is, whenever there are 7 persons 
in the system, then the time until the next arrival is exponentially distributed with 
mean 1/4, and is independent of the time until the next departure, which is itself 
exponentially distributed with mean 1/1. Such a system is called a birth and 
death process. The parameters {A,}°° 9) and {un}7°, are called, respectively, the 
arrival (or birth) and departure (or death) rates. 

Thus, a birth and death process is a continuous-time Markov chain with states 
{0,1,...} for which transitions from state n may go only to either state  — 1 or 
state 2 + 1. The relationships between the birth and death rates and the state 
transition rates and probabilities are 


Vo = Ao, 
vj = hi + Mis i>0O 
Poi = 1, 
Xj : 
Pid — rear i>0O 
1 1 
Pij-1 = ear i>0 
1 1 


The preceding follows, because if there are i in the system, then the next state will 
be i + 1 if a birth occurs before a death, and the probability that an exponential 
random variable with rate 4; will occur earlier than an (independent) exponential 
with rate ju; is A;/(A; + jj). Moreover, the time until either a birth or a death 
occurs is exponentially distributed with rate A; + 4; (and so, vj = A; + [4j). 
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Example 6.2 (The Poisson Process) Consider a birth and death process for which 


0 
0 


Ln = 0, for alln > 
Awa as for alln > 
This is a process in which departures never occur, and the time between successive 
arrivals is exponential with mean 1/A. Hence, this is just the Poisson process. Ml 


A birth and death process for which 4, = 0 for all 7 is called a pure birth 
process. Another pure birth process is given by the next example. 


Example 6.3 (A Birth Process with Linear Birth Rate) Consider a population 
whose members can give birth to new members but cannot die. If each member 
acts independently of the others and takes an exponentially distributed amount 
of time, with mean 1/A, to give birth, then if X(£) is the population size at time f, 
then {X(£), t > 0} is a pure birth process with A, = nA, n > 0. This follows since 
if the population consists of 2 persons and each gives birth at an exponential 
rate A, then the total rate at which births occur is 2A. This pure birth process is 
known as a Yule process after G. Yule, who used it in his mathematical theory 
of evolution. a 


Example 6.4 (A Linear Growth Model with Immigration) A model in which 


is called a linear growth process with immigration. Such processes occur naturally 
in the study of biological reproduction and population growth. Each individual 
in the population is assumed to give birth at an exponential rate A; in addition, 
there is an exponential rate of increase 0 of the population due to an external 
source such as immigration. Hence, the total birth rate where there are 7 persons 
in the system is 7A + 0. Deaths are assumed to occur at an exponential rate ju for 
each member of the population, so uy, = ny. 
Let X(t) denote the population size at time t. Suppose that X(0) = i and let 


M(t) = E[X@)] 


We will determine M(t) by deriving and then solving a differential equation that 
it satisfies. 

We start by deriving an equation for M(t + h) by conditioning on X(t). This 
yields 


M(t +h) = E[X(t +h)] 
= E[E[X(t + h)|X(@II 
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Now, given the size of the population at time ¢ then, ignoring events whose 
probability is o(h), the population at time t + h will either increase in size by 1 if 
a birth or an immigration occurs in (t,t + h), or decrease by 1 if a death occurs 
in this interval, or remain the same if neither of these two possibilities occurs. 
That is, given X(t), 


X(t) +1, with probability [6 + X(@)A]A + o(A) 
X(t +h) = 4 X(t)—1, with probability X()uh + o(A) 
X(t), with probability 1 — [6+ X(@A+X(®ulA + o(h) 


Therefore, 

E[X@ + A)| XM] = X@ + [04+ XDA -— XOplh + o(h) 
Taking expectations yields 

M(t +h) = M(t) + A— w)M@h +4 Oh + O(h) 


or, equivalently, 


Meee «a Ma +a+ 2 


Taking the limit as h > 0 yields the differential equation 
M'(t) = (A— pw) M(t) + 6 (6.1) 
If we now define the function h(t) by 
A(t) = (A— w)M() +0 
then 
A'(t) = (A — 2) M(t) 
Therefore, Differential Equation (6.1) can be rewritten as 


h'(t) 


= h(t) 


or 
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Integration yields 
logjh@®] = (A-—wt+e 

or 
h(t) = Keo 

Putting this back in terms of M(£) gives 
6+ (A—p)M(t) = Ke?-* 


To determine the value of the constant K, we use the fact that M(O) = 7 and 
evaluate the preceding at t = 0. This gives 


6+A-mwi=kK 


Substituting this back in the preceding equation for M(t) yields the following 
solution for M(t): 


6 
M(t) = ——[e?—* — 1] + ie! 
A- pb 


Note that we have implicitly assumed that 1 4 w. If A = yw, then Differential 
Equation (6.1) reduces to 


M(t) =0 (6.2) 
Integrating (6.2) and using that M(0) = i gives the solution 
M(t) = 0t +1 | 


Example 6.5 (The Queueing System M/M/1) Suppose that customers arrive at 
a single-server service station in accordance with a Poisson process having rate 
i. That is, the times between successive arrivals are independent exponential 
random variables having mean 1/4. Upon arrival, each customer goes directly 
into service if the server is free; if not, then the customer joins the queue (that 
is, he waits in line). When the server finishes serving a customer, the customer 
leaves the system and the next customer in line, if there are any waiting, enters the 
service. The successive service times are assumed to be independent exponential 
random variables having mean 1/w. 

The preceding is known as the M/M/1 queueing system. The first M refers to 
the fact that the interarrival process is Markovian (since it is a Poisson process) 
and the second to the fact that the service distribution is exponential (and, hence, 
Markovian). The 1 refers to the fact that there is a single server. 
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If we let X(t) denote the number in the system at time t then {X(t), t > 0} is a 
birth and death process with 


1 

0 | 
Example 6.6 (A Multiserver Exponential Queueing System) Consider an expo- 
nential queueing system in which there are s servers available, each serving at rate 


jw. An entering customer first waits in line and then goes to the first free server. 
This is a birth and death process with parameters 


_ fre, l<n<s 
ea sit: n>s 


An =A, n>0 


To see why this is true, reason as follows: If there are 7 customers in the system, 
where n < s, then 7 servers will be busy. Since each of these servers works at 
rate 1, the total departure rate will be mu. On the other hand, if there are n 
customers in the system, where 7 > s, then all s of the servers will be busy, and 
thus the total departure rate will be sw. This is known as an M/M/s queueing 
model. a 


Consider now a general birth and death process with birth rates {A,,} and death 
rates {1}, where j49 = 0, and let T; denote the time, starting from state i, it takes 
for the process to enter state i + 1, i > 0. We will recursively compute E[Tj], 
i > 0, by starting with i = 0. Since To is exponential with rate 49, we have 


1 
E{To] = ho 


For i > 0, we condition whether the first transition takes the process into state 
i—1ori+1. That is, let 


: 1, if the first transition from i is toi + 1 
‘10, — if the first transition from i is to i— 1 


and note that 


E(T;|; = 1] = ae 
iv Mi (6.3) 


E(T;|I; = 0] = 


hi + Mi 
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This follows since, independent of whether the first transition is from a birth 
or death, the time until it occurs is exponential with rate 4; + ji; if this first 
transition is a birth, then the population size is at i + 1, so no additional time 
is needed; whereas if it is death, then the population size becomes i — 1 and the 
additional time needed to reach i + 1 is equal to the time it takes to return to 
state i (this has mean E[T;_1]) plus the additional time it then takes to reach i + 1 
(this has mean E[T;]). Hence, since the probability that the first transition is a 
birth is A;/(A; + 4;), we see that 


1 Li 


E[T;] = + 
Be Ait Mi AG Mi 


(E[Tj_1] + E[T;]) 


or, equivalently, 


- Etah i> 1 
Starting with E[Tp] = 1/Ao, the preceding yields an efficient method to succes- 
sively compute E[T;], E[T2], and so on. 

Suppose now that we wanted to determine the expected time to go from state 
i to state j where i < j. This can be accomplished using the preceding by noting 
that this quantity will equal E[(T;] + E[Tj+1] tees E{Tj-1]. 


Example 6.7 For the birth and death process having parameters A; = A, uj = LU, 


1 
E(Ti] = 5 + = ET;-1] 


1 
- 5 + wE[T;_-1]) 


Starting with E[Tp] = 1/4, we see that 


Hire (1 a =), 


Xr 
aiff «(BY 


and, in general, 
ADS ieee Gy au (4) 

ier r d 
— 1-(/ayit! 


in i>o0 
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The expected time to reach state /, starting at state k, k <j, is 


E[time to go from k to j] = E[Tj] 
i=k 
Shag Rea Teh 


The foregoing assumes that A ¢ w. If A = wu, then 


i+1 
E[T;] = 1” 
E[time to go from k to j] = iG +1)—k(R +1) : 


2r 


We can also compute the variance of the time to go from 0 toi + 1 by utilizing 
the conditional variance formula. First note that Equation (6.3) can be written as 


1 
EIT = + Tara (1 — 1) (E[T;-1] + E[Ti)) 


Thus, 


Var(E[T;\Ii]) = (E[T;-1] + E[T;])? Vari) 
Midi 


= (E[T;_ EE) 
(Th 1 + EIT Ta 


(6.4) 


where Var(J;) is as shown since I; is a Bernoulli random variable with parameter 
p = j;/Q; + u;). Also, note that if we let X; denote the time until the transition 
from i occurs, then 


Var(Tj|I; = 1) = Var(X;j|Ij = 1) 
= Var(Xj) 
1 


at erat ay ee 


where the preceding uses the fact that the time until transition is independent of 
the next state visited. Also, 


Var(T;|I; = 0) = Var(X; + time to get back to 7 + time to then reach i +1) 
= Var(X;) + Var(T;_1) + Var(T;) (6.6) 
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where the foregoing uses the fact that the three random variables are independent. 
We can rewrite Equations (6.5) and (6.6) as 


Var(T;|J;) = Var(X;) + (1 — I) [Var(Tj;_1) + Var(T;)] 
SO 


Mi 
+ 
(Mit Ai)? Mit Ai 


E[Var(T;lJ;)] = [Var(Tj-1) + Var(Tj)] (6.7) 


Hence, using the conditional variance formula, which states that Var(T;) is the 
sum of Equations (6.7) and (6.4), we obtain 


it Li 

Var(T;) = + Var(Tj_-1) + Var(T; 
a (wi + Ai)? Teo ee) ee 
4 BS — etre + BERD 
(uj + Aj)? 
or, equivalently, 
1 Mi Mi 2 
j= Te E[T;_ E{T; 

Var(Tj) uO; +m a i Varta rae es [Tj-1] + ELT) 


Starting with Var(To) = 1/ rs and using the former recursion to obtain the expec- 
tations, we can recursively compute Var(T;). In addition, if we want the variance 
of the time to reach state j, starting from state k, k < j, then this can be expressed 
as the time to go from k to k + 1 plus the additional time to go from k + 1 to 
k + 2, and so on. Since, by the Markovian property, these successive random 
variables are independent, it follows that 


j-1 


Var(time to go from k to j) = > Var(T;) 
i=k 


6.4 The Transition Probability Function P;(t) 
Let 
Pi@) = PIX +s) =7|X() = 3} 
denote the probability that a process presently in state i will be in state j a 


time ¢ later. These quantities are often called the transition probabilities of the 
continuous-time Markov chain. 
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We can explicitly determine P;(t) in the case of a pure birth process having 
distinct birth rates. For such a process, let X, denote the time the process spends 
in state k before making a transition into state k + 1, k > 1. Suppose that the 
process is presently in state i, and let j > i. Then, as X; is the time it spends in 
state i before moving to state i + 1, and Xj, is the time it then spends in state 
i + 1 before moving to state i + 2, and so on, it follows that >” fe X;,, is the time 
it takes until the process enters state 7. Now, if the process has not yet entered 
state j by time #, then its state at time ¢ is smaller than /, and vice versa. That is, 


X(t<j > Xjt---+Xj1>t 


Therefore, for i < j, we have for a pure birth process that 
j-1 
P{X(t) < j|X(0) =i} =P > Xp >t 
k=i 


However, since X;,..., X;—1 are independent exponential random variables with 
respective rates Aj;,...,4;-1, we obtain from the preceding and Equation (5.9), 


which gives the tail distribution function of >” 1 X;,, that 
kai *k 
j= j-1 d 
P{X(t) < j|IXO) ==) e*# - 
{X() < 1X) =3} DD Serer 
k=i r#k, rai 


Replacing j by j + 1 in the preceding gives 


j j 
a 
PIX) <j + UXO == dle TT —— 
k=i r#k, r=i : ' 


Since 
P{X(t) = j|X() = i} = P{X(@ <j + 1X) =i} — P(X® < fIX() =3} 


and since P;;(t) = P{X; > t} = e~*?, we have shown the following. 


Proposition 6.1 For a pure birth process having 4; 4 A; when i 4 j 


j j d j-1 j-1 ry 
Hye r Aye r 2 ai 
Be Gag ee Seg? PS 
: _Ar k : ep: k 
kei r#k, r=i ksi r#k, rai 


Pi(t) =e" 
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Example 6.8 Consider the Yule process, which is a pure birth process in which 
each individual in the population independently gives birth at rate 4, and so 
An = nd, n > 1. Letting i = 1, we obtain from Proposition 6.1 


j j j-1 j-1 
_k r _k r 
Py@=doe™ TT ray ee ce Ol Seay | 
k=1 r£k, r=1 k=1 r£#k, r=1 
j-1 . j-1 : j ‘ j-1 é 
—jat —hat 
= yee ( TT O- T y 
r=1 k=1 r£#k, r=1 r£#k, r=1 
j-1 j j-1 ; 
— p-fit¢_q-1 —kat(_ J 
=eM(-1)1 +) e (4 1) I] <i 
k=1 r£k, r=1 
Now, 
Be oe G-D! 
(Esp atk (1—k)2—k)---(R-1—-k)G—k)! 
ae oe 
es (e ') 
so 


j ee 
PAO = > ¢ - jeep 
k=1 
j-1 ,. 
ee: jal —idt;_4yi 
=e me ( F Je (-1) 
= amas al _ a 


Thus, starting with a single individual, the population size at time t has a geomet- 
ric distribution with mean e*’. If the population starts with i individuals, then 
we can regard each of these individuals as starting her own independent Yule 
process, and so the population at time t will be the sum of i independent and 
identically distributed geometric random variables with parameter e~*“. But this 
means that the conditional distribution of X(t), given that X(0) = i, is the same 
as the distribution of the number of times that a coin that lands heads on each 
flip with probability e~*’ must be flipped to amass a total of i heads. Hence, the 
population size at time t has a negative binomial distribution with parameters i 
and e~**, so 


| : Me: 
Pi) = a jd =e fete 
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(We could, of course, have used Proposition 6.1 to immediately obtain an equa- 
tion for P(t), rather than just using it for P1;(¢), but the algebra that would 
have then been needed to show the equivalence of the resulting expression to the 
preceding result is somewhat involved.) a 


We shall now derive a set of differential equations that the transition proba- 
bilities P(t) satisfy in a general continuous-time Markov chain. However, first 
we need a definition and a pair of lemmas. 

For any pair of states i and {, let 


qi = ViP i 


Since v; is the rate at which the process makes a transition when in state i and 
P;; is the probability that this transition is into state /, it follows that qj is the 
rate, when in state i, at which the process makes a transition into state 7. The 
quantities qj are called the instantaneous transition rates. Since 


Uj = Sa; = aCr, 
J j 


it follows that specifying the instantaneous transition rates determines the para- 
meters of the continuous-time Markov chain. 


Lemma 6.2 
. 1 — Pi(h) 
(a) lim,;,.9 ————— = 9 
h 
Pi(h £33 
(b) timp so 2? = gy when i 4 


Proof. We first note that since the amount of time until a transition occurs is 
exponentially distributed it follows that the probability of two or more transitions 
ina time h is o(h). Thus, 1 — P;(A), the probability that a process in state i at time 
0 will not be in state i at time A, equals the probability that a transition occurs 
within time h plus something small compared to h. Therefore, 


1 — Py(h) = vjh + o(h) 
and part (a) is proven. To prove part (b), we note that P;(h), the probability that 


the process goes from state i to state j in a time h, equals the probability that a 
transition occurs in this time multiplied by the probability that the transition is 
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into state j, plus something small compared to h. That is, 
Pi(h) = hy;Pi + oh) 


and part (b) is proven. a 


Lemma 6.3 For alls > 0,t > 0, 


Py(t + s) = )~ Pip (t)Paj(s) (6.8) 
k=0 


Proof. In order for the process to go from state i to state j in time ¢ + s, it must 
be somewhere at time ¢ and thus 


Py(t + s) = P{X(t + s) = s|X() = 7} 


=) P{X¢+s) =), X() = k|X(0) =) 
k=0 


= S°PIX(t +s) = {IX =k, X(0) = 4} - P(X) = k|X(0) =H} 
k=0 

= > PIX +s) =j|X@® =k}- P{X@® =k|X(0) = 3} 
k=0 

= )° Py (s)Pix(2) 


k=0 


and the proof is completed. a 


The set of Equations (6.8) is known as the Chapman—Kolmogorov equations. 
From Lemma 6.3, we obtain 


Py(h + t) — Py(t) = D> Pig (h)Pyj(t) — Py) 
k=0 


= > Pig(h)Ppj(t) — (1 — Pii(h) P(t) 
kei 


and thus 


‘ P(t +h)- P(t) ts Pip(h) . 1oP OD : 
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Now, assuming that we can interchange the limit and the summation in the 
preceding and applying Lemma 6.2, we obtain 


Pit) = D> dikPej(t) — viP iO) 
k#i 


It turns out that this interchange can indeed be justified and, hence, we have the 
following theorem. 


Theorem 6.1 (Kolmogorov’s Backward Equations) For all states i,j, and times 
t>0, 


Pit) = Yo ginPajt) — viPi@) 
k#i 


Example 6.9 The backward equations for the pure birth process become 
Pit) = APip1 jy — AiPGO a 
Example 6.10 The backward equations for the birth and death process become 


Poj(t) = AoPrj(t) — AoPoj®, 


re Mi 
roar pod [eee SR Pi-1; — (i Pi 
Pi(t) = (A +a] +1) + aa 1400 Ai + Hi)P iO) 
or equivalently, 
oj (2) = Ao[P1j(t) — Po], 
Pi) = APin1jO + wiPi yO -—OAit udPiO, i> 0 iGe7) 


Example 6.11 (A Continuous-Time Markov Chain Consisting of Two States) Con- 
sider a machine that works for an exponential amount of time having mean 1/A 
before breaking down; and suppose that it takes an exponential amount of time 
having mean 1/y to repair the machine. If the machine is in working condition 
at time 0, then what is the probability that it will be working at time t = 10? 

To answer this question, we note that the process is a birth and death process 
(with state 0 meaning that the machine is working and state 1 that it is being 
repaired) having parameters 


ho =A, H1 =H, 
Ai = 0, i#0, bj = 0, iZxl 
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We shall derive the desired probability, namely, Po9(10) by solving the set of 
differential equations given in Example 6.10. From Equation (6.9), we obtain 


Poo(t) = AlP10(t) — Poott)I, (6.10) 
Pio (t) = uPoo(t) — uP i0(t) (6.11) 


Multiplying Equation (6.10) by w and Equation (6.11) by A and then adding the 
two equations yields 


UPoo(t) + APio(t) = 0 
By integrating, we obtain 

pPoo(t) + APig9(t) = c 
However, since P99 (0) = 1 and P19(0) = 0, we obtain c = pw and hence, 

Poo (t) + APio(t) = wu (6.12) 
or equivalently, 

APio(t) = wll — Poot)] 


By substituting this result in Equation (6.10), we obtain 


Poo (t) = w[1 — Poo(t)] — APoo(t) 
= uh — (w+ A)Poolt) 


Letting 


- ee ee 
h(t) = Poo(t) aa 


we have 


irae anak 
A(t)=u w+ |nn+ | 


= —(u + A)h() 
or 


WO) 
h(t) 


—(u +A) 
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By integrating both sides, we obtain 


logh(t)=—-(u+A)t+c 


or 
h(t) = Ke ett 
and thus 
Poo(t) = Ke #tH# 4 © _ 
00 (2) may 


which finally yields, by setting t = 0 and using the fact that Po9(0) = 1, 


A ML 
Poo(t) = ——-e #1? 4 
00 (¢) reer ath 


From Equation (6.12), this also implies that 


lad Me -(ut+Ayt 
Pio) == = Se 
H+trA pb+h 


Hence, our desired probability is as follows: 


r ML 
Po9(10) = ——e7 104) 5, _—_ 
00 (10) FEA rea 


Another set of differential equations, different from the backward equations, 
may also be derived. This set of equations, known as Kolmogorov’s forward 
equations is derived as follows. From the Chapman—Kolmogorov equations 


(Lemma 6.3), we have 
Py(t + h) — Pit) = >- Pig (€)Ppj(h) — Py (2) 
k=0 


=D Pie (Pail) — [1 — Pii(h) P(t) 
kAj 


and thus 


PACD = PAD ~. Pe@)> [Lo Psth 
lim gece) i) = lim So Pe) wD ia 
h-0 h h>0 we h h 


P(t) 
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and, assuming that we can interchange limit with summation, we obtain from 
Lemma 6.2 


Pi) = a Dj Pig (t) — vjP AO 
k#j 


Unfortunately, we cannot always justify the interchange of limit and summation 
and thus the preceding is not always valid. However, they do hold in most models, 
including all birth and death processes and all finite state models. We thus have 
the following. 


Theorem 6.2 (Kolmogorov’s Forward Equations) Under suitable regularity con- 
ditions, 


Pi) = Pe TnjP ig (t) — vj Pi (t) (6.13) 
per 


We shall now solve the forward equations for the pure birth process. For this 
process, Equation (6.13) reduces to 


Pi(t) = Aj-1Pij-1O0 = AjP ii) 


However, by noting that P(t) = 0 whenever j < i (since no deaths can occur), 
we can rewrite the preceding equation to obtain 


P(t) = —A;Pu@), 


Mos 6.14 
Pi (t) = Aj-1Pij-1@) — AjPAO), joitl ( ) 


Proposition 6.4 For a pure birth process, 
Pasers, i>o 
t 
Pi(t) = ase f e*i$P;;-1(s) ds, joeit+il 
0 
Proof. The fact that P;(t) = e~** follows from Equation (6.14) by integrating 
and using the fact that P;;(0) = 1. To prove the corresponding result for P;(t), 
we note by Equation (6.14) that 
eb [Pie + ajPi®)| = eA) APIO 
or 


d 
oF [e** P(t) = je" Pi j-1(0) 


Hence, since P;(0) = 0, we obtain the desired results. | 
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Example 6.12 (Forward Equations for Birth and Death Process) The forward 
equations (Equation 6.13) for the general birth and death process become 


Pig(t) = Y qeoPig(t) — A0Pi0) 


k#0 
= 11Pi(t) — AoPin() (6.15) 
Pi (t) = ye Tn Pig(t) — Aj + mj)PiO) 
kj 
= Aj-1Pij-1) + Mj+1Pij41 © — Aj + Hj) PIO (6.16) 


6.5 Limiting Probabilities 


In analogy with a basic result in discrete-time Markov chains, the probability 
that a continuous-time Markov chain will be in state j at time ¢ often converges 
to a limiting value that is independent of the initial state. That is, if we call this 
value P;, then 


Pj = tim. P(t) 


where we are assuming that the limit exists and is independent of the initial state i. 
To derive a set of equations for the P;, consider first the set of forward equations 


P(t) = a Ik Piz (t) — vjPii(t) (6.17) 
k#j 


Now, if we let t approach oo, then assuming that we can interchange limit and 
summation, we obtain 


dim Pi) = Him | YS ajPinO) — vjP iGO 


kj 


Yo aaj Pe — UP; 
ki 


However, as Pj(t) is a bounded function (being a probability it is always 
between 0 and 1), it follows that if Pi.@) converges, then it must converge to 0 
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(why is this?). Hence, we must have 


or 


0= Y- anjPe — uP; 


kj 


vjPj= > qnjPp, alll states j (6.18) 


kAj 


The preceding set of equations, along with the equation 


SUR (6.19) 
i 


can be used to solve for the limiting probabilities. 


Remarks 


(i) 


(i1) 


We have assumed that the limiting probabilities P; exist. A sufficient condition for 
this is that 


(a) all states of the Markov chain communicate in the sense that starting in state i 
there is a positive probability of ever being in state /, for all i,j and 

(b) the Markov chain is positive recurrent in the sense that, starting in any state, 
the mean time to return to that state is finite 


If conditions (a) and (b) hold, then the limiting probabilities will exist and satisfy 

Equations (6.18) and (6.19). In addition, P; also will have the interpretation of being 
the long-run proportion of time that the process is in state /. 
Equations (6.18) and (6.19) have a nice interpretation: In any interval (0,t) the 
number of transitions into state 7 must equal to within 1 the number of transitions 
out of state j (why?). Hence, in the long run, the rate at which transitions into state j 
occur must equal the rate at which transitions out of state j occur. When the process 
is in state j, it leaves at rate v;, and, as P; is the proportion of time it is in state j, it 
thus follows that 


vjP; =rate at which the process leaves state j 
Similarly, when the process is in state k, it enters j at a rate qg;. Hence, as P, is the 


proportion of time in state k, we see that the rate at which transitions from k to j 
occur is just q,;P,; thus 


> abjPe = rate at which the process enters state j 
kAj 
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So, Equation (6.18) is just a statement of the equality of the rates at which the process 
enters and leaves state j. Because it balances (that is, equates) these rates, Equation 
(6.18) is sometimes referred to as a set of “balance equations.” 

(iii) When the limiting probabilities P; exist, we say that the chain is ergodic. The P; 
are sometimes called stationary probabilities since it can be shown that (as in the 
discrete-time case) if the initial state is chosen according to the distribution {P;}, then 
the probability of being in state j at time t is Pj, for all t. 


Let us now determine the limiting probabilities for a birth and death process. 
From Equation (6.18) or equivalently, by equating the rate at which the process 
leaves a state with the rate at which it enters that state, we obtain 


State Rate at which leave = rate at which enter 
0 AoPo = MiP 
1 (Aq + 1)P1 = u2P2 + AoPo 
2 (Az + 2)P2 = w3P3 + AqP1 
nen >i (an + Mn) Pn = bnpiPno1 + An 1Pn-1 


By adding to each equation the equation preceding it, we obtain 


AoPo = 1P1, 
AP) = w2P2, 
A2P2 = 13P3, 


AnPn = Mn+1Pn+1, nz>O0 


Solving in terms of Po yields 


P, = —Po, 

M1 

Mt A1A0 
Py =—Py= Po, 

[2 2M 

2 A210 
P3=—Py= Po, 

M3 M3H2M1 

pers An—1An—2°° Ata 
Re ae ee n—14n-—2 sas 

Ln Mnbn—1*** 2H 


And by using the fact that °°? 5 P, = 1, we obtain 


CO 
An-1°+* A140 
SB eres 


n=1 
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or 
1 
Po= RAG IEK 
1+ pel Gaetie 
and so 
Py ees ee 
pS La eae n>1 (6.20) 


RoaAteAn—1\? 
wea Mn (1 + Doge wnat ) 


The foregoing equations also show us what condition is necessary for these lim- 
iting probabilities to exist. Namely, it is necessary that 


[o,@) 

Aor 
y a el ane (6.21) 
M1iP2° 


n=1 
This condition also may be shown to be sufficient. 


In the multiserver exponential queueing system (Example 6.6), Condition 
(6.21) reduces to 


3 es 

—— <@w 
n 

re (sit) 


which is equivalent to A/su < 1. 
For the linear growth model with immigration (Example 6.4), Condition (6.21) 
reduces to 


So eee ia 
ny” 


n=1 
Using the ratio test, the preceding will converge when 


. O@+A)---O+nA) ni” ; O+ nh 
lim = lim —— 
noo (nt Dutt O@+A)--O+F(m—Da) m0 (n+ Dy 


xr 
=—<l 


That is, the condition is satisfied when A < uw. When A > p it is easy to show that 
Condition (6.21) is not satisfied. 


Example 6.13 (A Machine Repair Model) Consider a job shop that consists of 
M machines and one serviceman. Suppose that the amount of time each machine 
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runs before breaking down is exponentially distributed with mean 1/A, and sup- 
pose that the amount of time that it takes for the serviceman to fix a machine 
is exponentially distributed with mean 1/y. We shall attempt to answer these 
questions: (a) What is the average number of machines not in use? (b) What 
proportion of time is each machine in use? 


Solution: If we say that the system is in state 2 whenever 1 machines are not 
in use, then the preceding is a birth and death process having parameters 


Mn = n>1 
Fe (M —n)d, n<M 
“10, n>M 


This is so in the sense that a failing machine is regarded as an arrival and a 
fixed machine as a departure. If any machines are broken down, then since the 
serviceman’s rate is (4, 4, = mw. On the other hand, if 2 machines are not in 
use, then since the M — 1 machines in use each fail at a rate A, it follows that 
An = (M — n)i. From Equation (6.20) we have that P,,, the probability that 
machines will not be in use, is given by 


1 
1+ >0™, [MA(M - 1)a--- (M14 1)A/pu"] 
1 
143, (A/w)"MI/(M — 0)! 

2 (A/14)"M!/(M — n)! 
14 >™, (/w)"M!/(M — 7)! 


Py = 


n=0,1,...,M 


Hence, the average number of machines not in use is given by 


(6.22) 


5 aP = Deo 1(A/m)"M1/(M — 7)! 
mo E+ ka O/w)"MI/(M — 0)! 


To obtain the long-run proportion of time that a given machine is working we 
will compute the equivalent limiting probability of the machine working. To 
do so, we condition on the number of machines that are not working to obtain 


M 
P{machine is working} = > P{machine is working|” not working}P, 
n=0 


x 3 M-n p, __ (since if m are not working, 
- A M  ” then M — n are working!) 
F (-— 
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Deep 
a = 
oan 
0 
where are nP, is given by Equation (6.22). a 


Example 6.14 (The M/M/1 Queue) In the M/M/1 queue Ay, = A, Un = wand 
thus, from Equation (6.20), 


_ Asp" 
1+ Dope A/)” 
=(/u)"(1—A/u), n20 


Py 


provided that A/u < 1. It is intuitive that 4 must be less than yz for limiting 
probabilities to exist. Customers arrive at rate 4 and are served at rate w, and 
thus if A > yu, then they arrive at a faster rate than they can be served and the 
queue size will go to infinity. The case A = yw behaves much like the symmetric 
random walk of Section 4.3, which is null recurrent and thus has no limiting 
probabilities. a 


Example 6.15 Let us reconsider the shoe shine shop of Example 6.1, and deter- 
mine the proportion of time the process is in each of the states 0, 1, 2. Because 
this is not a birth and death process (since the process can go directly from state 
2 to state 0), we start with the balance equations for the limiting probabilities. 


State Rate that the process leaves = rate that the process enters 
0 APo = f2P2 
1 u1P, =APo 
2 b2P2 = Py 


Solving in terms of Po yields 


x ny 
P27 = —Po, P; = —Po 
M2 im 


which implies, since Po + P, + P2 = 1, that 


or 


p= Uib2 
Hip2 + A(u1 + M2) 
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and 
Ap2 
Py = ’ 
My e2 + ACU + M2) 
Xx 
ie M1 Ps 


papa + AC + BD) 


Example 6.16 Consider a set of m components along with a single repairman. 
Suppose that component i functions for an exponentially distributed time with 
rate 4; and then fails. The time it then takes to repair component i is exponen- 
tial with rate j,i = 1,...,2. Suppose that when there is more than one failed 
component the repairman always works on the most recent failure. For instance, 
if there are at present two failed components—say, components 1 and 2 of which 
1 has failed most recently—then the repairman will be working on component 
1. However, if component 3 should fail before 1’s repair is completed, then the 
repairman would stop working on component 1 and switch to component 3 (that 
is, a newly failed component preempts service). 

To analyze the preceding as a continuous-time Markov chain, the state must 
represent the set of failed components in the order of failure. That is, the state 
will be i1, i2,..., ip if i1, i2,...,i~ are the k failed components (all the other 7 —k 
being functional) with 7; having been the most recent failure (and is thus presently 
being repaired), i2 the second most recent, and so on. Because there are k! possible 
orderings for a fixed set of k failed components and (7) choices of that set, it 
follows that there are 


“(n ” n! il 
Yo (i)! = arma" i! 


possible states. 
The balance equations for the limiting probabilities are as follows: 


Hip + >) AG] PGs.) = D> PG iay....t)Mit+P Crs... ipains 


fl ifi; 
j=1,...k j=1,..k 
Yo uP) = Do POu: (6.23) 
i=1 i=1 


where ¢ is the state when all components are working. The preceding equations 
follow because state 71,..., 7, can be left either by a failure of any of the additional 
components or by a repair completion of component i;. Also, that state can be 
entered either by a repair completion of component i when the state is i, i1,..., ig 
or by a failure of component i; when the state is i2,..., ip. 
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However, if we take 


Wry eee rcy e 
P(i1,...,4,) = —12 —_£ P(@) (6.24) 
Hi Min ++ Mig 


then it is easily seen that Equations (6.23) are satisfied. Hence, by uniqueness 
these must be the limiting probabilities with P(@) determined to make their sum 
equal 1. That is, 


—1 
Aix hip 
EDs: as ne Miz Hip 


11 ,50k 


As an illustration, suppose m = 2 and so there are five states ¢, 1, 2, 12, 21. Then 
from the preceding we would have 


r iS: Dia 
pa) = [1+ Dre | 
M1 2 Mib2 


Xr 
P(1) = —P@), 
My 


Xr 
P(2) = =P), 
2 


P(1,2)=P2,) = M2 pig) 
Mi fl2 


It is interesting to note, using Equation (6.24), that given the set of failed com- 
ponents, each of the possible orderings of these components is equally likely. li 


6.6 Time Reversibility 


Consider a continuous-time Markov chain that is ergodic and let us consider the 
limiting probabilities P; from a different point of view than previously. If we con- 
sider the sequence of states visited, ignoring the amount of time spent in each state 
during a visit, then this sequence constitutes a discrete-time Markov chain with 
transition probabilities P;;. Let us assume that this discrete-time Markov chain, 
called the embedded chain, is ergodic and denote by 7; its limiting probabilities. 
That is, the z; are the unique solution of 


j= paar all i 


J 
Sa 
i 
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Now, since z; represents the proportion of transitions that take the process into 
state i, and because 1/v; is the mean time spent in state i during a visit, it seems 
intuitive that P;, the proportion of time in state i, should be a weighted average 
of the 2; where x; is weighted proportionately to 1/v;. That is, it is intuitive 
that 


7; /V; 


P; = ——— 6.25 
a mj [Vj 


To check the preceding, recall that the limiting probabilities P; must satisfy 


v;P; = SP ans all i 
j#i 


or equivalently, since Pj; = 0 


v;P; = Se PP ia all i 
j 


Hence, for the P;s to be given by Equation (6.25), the following would be 
necessary: 


Uji = ae. all i 
j 


But this, of course, follows since it is in fact the defining equation for the z;s. 

Suppose now that the continuous-time Markov chain has been in operation 
for a long time, and suppose that starting at some (large) time T we trace the 
process going backward in time. To determine the probability structure of this 
reversed process, we first note that given we are in state i at some time—say, 
t—the probability that we have been in this state for an amount of time greater 
than s is just e~”S. This is so, since 


P{process is in state i throughout [tf — s, t]|X(£) = i} 


__ P{process is in state 7 throughout [t — s, ¢]} 
7 P{X(@t) =i} 

_ P{X(t—s) = ife"v* 

7 P{X(t) = i} 


—vUjs 


=e 


since for ft large P{X(t — s) = i} = P{X(t) = i} = Pj. 

In other words, going backward in time, the amount of time the process spends 
in state 7 is also exponentially distributed with rate v;. In addition, as was shown 
in Section 4.8, the sequence of states visited by the reversed process constitutes 
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a discrete-time Markov chain with transition probabilities Qj given by 


TP ji 


Oj = 


Tj 


Hence, we see from the preceding that the reversed process is a continuous- 
time Markov chain with the same transition rates as the forward-time process 
and with one-stage transition probabilities O,. Therefore, the continuous-time 
Markov chain will be time reversible, in the sense that the process reversed in 
time has the same probabilistic structure as the original process, if the embedded 
chain is time reversible. That is, if 


mjP i = TP ii, for all 1, J 


Now, using the fact that P; = (ai/vi)/ (Xj m;/vj), we see that the preceding 
condition is equivalent to 


Pidi = Pidjis for all 1, J (6.26) 


Since P; is the proportion of time in state i and qj; is the rate when in state 
i that the process goes to j, the condition of time reversibility is that the rate 
at which the process goes directly from state i to state j is equal to the rate at 
which it goes directly from j to i. It should be noted that this is exactly the same 
condition needed for an ergodic discrete-time Markov chain to be time reversible 
(see Section 4.8). 

An application of the preceding condition for time reversibility yields the fol- 
lowing proposition concerning birth and death processes. 


Proposition 6.5 An ergodic birth and death process is time reversible. 


Proof. We must show that the rate at which a birth and death process goes from 
state i to state i + 1 is equal to the rate at which it goes from i + 1 to i. In 
any length of time t the number of transitions from i to i + 1 must equal to 
within 1 the number from i + 1 to i (since between each transition from 7 to 
i + 1 the process must return to i, and this can only occur through i + 1, 
and vice versa). Hence, as the number of such transitions goes to infinity as 
t > o, it follows that the rate of transitions from i toi + 1 equals the rate from 
i+1toi. | 


Proposition 6.5 can be used to prove the important result that the output 
process of an M/M/s queue is a Poisson process. We state this as a corollary. 


Corollary 6.6 Consider an M/M/s queue in which customers arrive in accor- 
dance with a Poisson process having rate A and are served by any one of s servers— 
each having an exponentially distributed service time with rate yw. If A < sj, then 
the output process of customers departing is, after the process has been in oper- 
ation for a long time, a Poisson process with rate A. 
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| X(0) 


ee 


x x x—* t 


X = times at which going backward in time, X(é) increases 
= times at which going forward in time, X(f) decreases 


Figure 6.1. The number in the system. 


Proof. Let X(t) denote the number of customers in the system at time ¢. Since 
the M/M/s process is a birth and death process, it follows from Proposition 6.5 
that {X(¢),t > 0} is time reversible. Going forward in time, the time points at 
which X(t) increases by 1 constitute a Poisson process since these are just the 
arrival times of customers. Hence, by time reversibility the time points at which 
X(t) increases by 1 when we go backward in time also constitute a Poisson 
process. But these latter points are exactly the points of time when customers 
depart (see Figure 6.1). Hence, the departure times constitute a Poisson process 
with rate i. a 


Example 6.17 Consider a first come first serve M/M/1 queue, with arrival rate 
A and service rate 4, where A < p, that is in steady state. Given that customer C 
spends a total of ¢ time units in the system, what is the conditional distribution 
of the number of others that were present when C arrived? 


Solution: Suppose that C arrived at time s and departed at time s + t. Because 
the system is first come first served, the number that were in the system when C 
arrived is equal to the number of departures of other customers that occur after 
time s and before time s + t, which is equal to the number of arrivals in the 
reversed process in that interval of time. Now, in the reversed process C would 
have arrived at time s + t and departed at time s. Because the reversed process 
is also an M/M/1 queueing system, the number of arrivals during that interval 
of length ¢ is Poisson distributed with mean At. (For a more direct argument 
for this result, see Section 8.3.1.) | 


We have shown that a process is time reversible if and only if 
Pidi = Pidyi for all i Fi 


Analogous to the result for discrete-time Markov chains, if we can find a 
probability vector P that satisfies the preceding then the Markov chain is time 
reversible and the P;s are the long-run probabilities. That is, we have the following 
proposition. 
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Proposition 6.7 If for some set {P;} 


SpSte PP 
i 


and 
Pidi = Pidii for all i Fi (6.27) 


then the continuous-time Markov chain is time reversible and P; represents the 
limiting probability of being in state i. 


Proof. For fixed i we obtain upon summing Equation (6.27) over all j:j7 #7 


So Pda = YBa 


j#i j#i 
or, since ye ij = Vi 


viPi =D) Pigg 
ii 


Hence, the Pjs satisfy the balance equations and thus represent the limiting prob- 
abilities. Because Equation (6.27) holds, the chain is time reversible. | 


Example 6.18 Consider a set of m machines and a single repair facility to service 
them. Suppose that when machine i,i = 1,...,7, goes down it requires an expo- 
nentially distributed amount of work with rate jz; to get it back up. The repair 
facility divides its efforts equally among all down components in the sense that 
whenever there are k down machines 1 < k < n each receives work at a rate 
of 1/k per unit time. Finally, suppose that each time machine i goes back up it 
remains up for an exponentially distributed time with rate 4;. 

The preceding can be analyzed as a continuous-time Markov chain having 2” 
states where the state at any time corresponds to the set of machines that are down 
at that time. Thus, for instance, the state will be (i1,i2,..., 7%) when machines 
i1,..., 4p are down and all the others are up. The instantaneous transition rates 
are as follows: 


Vit yeossig—1)s(i1yeesig) = Aig? 
Vit yesi)s(itysig—1) = Hig /R 


where i1,..., 4 are all distinct. This follows since the failure rate of machine ig 
is always 4;, and the repair rate of machine i, when there are k failed machines 


is 1}, /R. 
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Hence, the time reversible equations from (6.27) are 
PUBS RS ip) Mi, /R = P(y,..., ip—1)Ai, 


or 


Rii, . . 
Py, nae) ip—1) 
Hi, 


i 


PCit,.- +544) = 


kdj, (R- Wadi , aah 
= SEG «+5 ip?) upon iterating 


k 
= kt] [Qi /ui)P@) 


j=l 
where ¢ is the state in which all components are working. Because 
PCH) + DY) Pity.) = 1 


we see that 


‘ -1 


P)=|1+ Do RY [Oi /ni) (6.28) 


I y58k j=1 


where the preceding sum is over all the 2” — 1 nonempty subsets {i1,..., ig} 
of {1,2,...,2}. Hence, as the time reversible equations are satisfied for this 
choice of probability vector it follows from Proposition 6.7 that the chain is 
time reversible and 


k 
PCity.-+5ig) = RLY [Ou,/1i)P) 


j=l 


with P(¢) being given by (6.28). 
For instance, suppose there are two machines. Then, from the preceding we 
would have 
1 
1+ Aq/M1 + A2/m2 + 2A4A2/mi M2’ 


P@) = 
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P(1) = A1/MA 
1+ Aq/m1 + A2/pm2 + 2AA2/M1 M2’ 
PQ) = h2/ M2 
1+ Aq/m1 + A2/pm2 + 2A1A2/M1 M2’ 
2AqAr 
P(1,2) = it = 


HMye2t1 + Aq/iy + A2/m2 + 24A2/M1 12] 


Consider a continuous-time Markov chain whose state space is S. We say that 
the Markov chain is truncated to the set A C S if qj is changed to 0 for alli € A, 
j € A. That is, transitions out of the class A are no longer allowed, whereas ones 
in A continue at the same rates as before. A useful result is that if the chain is 
time reversible, then so is the truncated one. 


Proposition 6.8 A time reversible chain with limiting probabilities P;, 7 € S that 
is truncated to the set A C S and remains irreducible is also time reversible and 
has limiting probabilities pA given by 


P; 
pa=—! _, jeA 
i eae 
Proof. By Proposition 6.7 we need to show that, with pA as given, 
PA gi = PA aii for i€ A, J eA 
or, equivalently, 


Pigi = Pidii forie A, jEA 


But this follows since the original chain is, by assumption, time reversible. Mf 


Example 6.19 Consider an M/M/1 queue in which arrivals finding N in the 
system do not enter. This finite capacity system can be regarded as a truncation 
of the M/M/1 queue to the set of states A = {0,1,..., N}. Since the number in 
the system in the M/M/1 queue is time reversible and has limiting probabilities 
P; = (A/w)/(1 — A/p) it follows from Proposition 6.8 that the finite capacity 
model is also time reversible and has limiting probabilities given by 


(A/my! 


SN ya yee = 0, 1, sey N |_| 
Vi=0A/ Hy! 


j — 


Another useful result is given by the following proposition, whose proof is left 
as an exercise. 
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Proposition 6.9 If {X;,(t), t >O}are, fori = 1,...,, independent time reversible 
continuous-time Markov chains, then the vector process {(Xj(f),...,Xn(t)), 
t > 0} is also a time reversible continuous-time Markov chain. 


Example 6.20 Consider an m-component system where component j,i = 
1,...,”, functions for an exponential time with rate 4; and then fails; upon 
failure, repair begins on component i, with the repair taking an exponentially 
distributed time with rate j4;. Once repaired, a component is as good as new. The 
components act independently except that when there is only one working com- 
ponent the system is temporarily shut down until a repair has been completed; it 
then starts up again with two working components. 


(a) What proportion of time is the system shut down? 
(b) What is the (limiting) averaging number of components that are being repaired? 


Solution: Consider first the system without the restriction that it is shut down 
when a single component is working. Letting X;(t), i = 1,...,, equal 1 
if component i is working at time t and 0 if it failed, ‘lien. (x, (t), t > 0}, 
i= 1,...,m, are independent birth and death processes. Because a birth and 
death process is time reversible, it follows from Proposition 6.9 that the process 
{(X1@,..., Xn(£)), t > 0} is also time reversible. Now, with 


PiQ) = him PIX = 7), j=0,1 


we have 
Hi Ki 
P;(1) = , PO) = 
i(1) Fore i(O) many 
Also, with 


PC, -+-9fn) = lim P{X{@) = 7,7 = 1...) 


it follows, by independence, that 


n 
POiscta) = Ps. Po Cita teat 


Now, note that shutting down the system when only one component is work- 
ing is equivalent to truncating the preceding unconstrained system to the set 
consisting of all states except the one having all components down. There- 
fore, with Py denoting a probability for the truncated system, we have from 
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Proposition 6.8 that 


wae iC eee i 
Prits---in) = Ge, ESO 


where 


C= P(0,...,0) = I] dj / (uj + Aj) 
j=l 


Hence, letting (0, 1;) = (0,...,0,1,0,...,0) be the 7 vector of zeroes and ones 
whose single 1 is in the ith place, we have 


Pr(system is shut down) = ye Pr(O, 1;) 


Aj 
eG UG) 
Cy Mi/hi 
~ 1-C 


Let R denote the number of components being repaired. Then with J; equal to 1 
if component i is being repaired and 0 otherwise, we have for the unconstrained 
(nontruncated) system that 


=E ba = D> Pi(0) = So i/(us + A) 
i=1 i=1 i=1 
But, in addition, 
E[R] = E[R|all components are in repair]C 
+ E[R|not all components are in repair](1 — C) 
=nC + E7[R](1 — C) 
implying that 


ie if (as + AG) — 2C 
1-C 


E7[R] — 
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6.7. Uniformization 


Consider a continuous-time Markov chain in which the mean time spent in a 

state is the same for all states. That is, suppose that v; = v, for all states i. In this 

case since the amount of time spent in each state during a visit is exponentially 

distributed with rate v, it follows that if we let N(t) denote the number of state 

transitions by time f, then {N(¢), t > 0} will be a Poisson process with rate v. 
To compute the transition probabilities P;;(¢), we can condition on N(t): 


P(t) = P{X® = j|XO) = 7 


= Do P{X() = {|X@) =i, N(D) = n}P{N(t) = n|X(0) = i} 
n=0 
= ap Ut)” 
= DUPIXO = (IX =i, NO = nye —— 
n=0 . 


Now, the fact that there have been 7 transitions by time ¢ tells us something about 
the amount of time spent in each of the first 7 states visited, but since the distri- 
bution of time spent in each state is the same for all states, it follows that knowing 
that N(t) = 7 gives us no information about which states were visited. Hence, 


P{X(t) = /|X0) =14, NO) =n} = 


where P; is just the n-stage transition probability associated with the discrete-time 
Markoy chain with transition probabilities Pj; and so when v; = v 


P(t) = > Pe a (6.29) 


Equation (6.29) is often useful from a computational point of view since it enables 
us to approximate P;(t) by taking a partial sum and then computing (by matrix 
multiplication of the transition probability matrix) the relevant 1 stage probabil- 
itles Pi. 

Whereas the applicability of Equation (6.29) would appear to be quite limited 
since it supposes that v; = v, it turns out that most Markov chains can be put in 
that form by the trick of allowing fictitious transitions from a state to itself. To 
see how this works, consider any Markov chain for which the v; are bounded, 
and let v be any number such that 


Ui <v, for alli (6.30) 


When in state i, the process actually leaves at rate v;; but this is equivalent to 
supposing that transitions occur at rate v, but only the fraction vj/v of transitions 
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are real ones (and thus real transitions occur at rate v;) and the remaining fraction 
1 — v;/v are fictitious transitions that leave the process in state 7. In other words, 
any Markov chain satisfying Condition (6.30) can be thought of as being a process 
that spends an exponential amount of time with rate v in state i and then makes 


a transition to j with probability Pi, where 
f ans 
~ > j=! 
PSA ye: © ee 6.31 
: (py j#i oe 


Hence, from Equation (6.29) we have that the transition probabilities can be 
computed by 


[o,@) 
vt Ut)” 
n=0 : 


where LY are the n-stage transition probabilities corresponding to Equation 


(6.31). This technique of uniformizing the rate in which a transition occurs from 
each state by introducing transitions from a state to itself is known as uniformiza- 
tion. 


Example 6.21 Let us reconsider Example 6.11, which models the workings of a 
machine—either on or off—as a two-state continuous-time Markov chain with 


Por = Pio = 1, 


v9 =A, vi=pe 


Letting v = A + w, the uniformized version of the preceding is to consider it a 
continuous-time Markov chain with 


wet FSI 


As Poo = Po, it follows that the probability of a transition into state 0 is equal 
to 4/(A + 4) no matter what the present state. Because a similar result is true for 
state 1, it follows that the m-stage transition probabilities are given by 


b=pyp 7B biaOl 
n r . 
Wie aS n>1,i=0,1 
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Hence, 
oo n 
_ YO pr_,-G4+wt lA + wel 
Poo(t) = d Poot nl 
n= 
=p Otwt Hi LL yews + p)t] 
A+ n! 
n=1 
=e At+wt 4 fy — e- Atm _F 
A+ bb 
lad A -tuyt 
ee Le 
A+m At+u 
Similarly, 
oo n 
= n —atut lA + )E] 
7 — 
— e-O+wt 4 py — e-Gtwry _4 
A+ pb 


Xr 


- lad 
A+ bb 


A+ bb 


The remaining probabilities are 


ny 
Poi(t) = 1 — Poot) = can 


Pio(t) =1— Pr (t) = ea 


e  Atwt 


1 — e Ott, 


1—e Oth | 


Example 6.22 Consider the two-state chain of Example 6.20 and suppose that 
the initial state is state 0. Let O(¢) denote the total amount of time that the process 


is in state 0 during the interval (0, 


the occupation time. We will now 
If we let 

1, 

0, 


if X(s) = 0 


Oe | if X(s) =1 


t). The random variable O(f) is often called 
compute its mean. 


then we can represent the occupation time by 


t 
O(t) = I(s) ds 
0 
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Taking expectations and using the fact that we can take the expectation inside 
the integral sign (since an integral is basically a sum), we obtain 


t 


E[O@)] = E[I(s)] ds 


0 


t 
= 0 P{X(s) = 0} ds 
0 


t 
= i. Poo(s) ds 
0 


bad r —(A+p)t 
= t+ Llese ey 
A+ pb (A + pu)? 


where the final equality follows by integrating 


ld A atu)s 
Poo(s) = ——— + ——e se 
00(s) ear ea 


(For another derivation of E[O(¢)], see Exercise 38.) | 
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For any pair of states i and j, let 


Using this notation, we can rewrite the Kolmogorov backward equations 


Pit) = Yo giePaj) — viPi@ 
k#i 


and the forward equations 
P;(t) = ys Dj Pig (t) — vjPAO) 
kj 
as follows: 


P) = Ve linPeiO (backward) 
Pi(t)= Dp rePie(t) (forward) 
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This representation is especially revealing when we introduce matrix notation. 
Define the matrices R and P(£), P’(t) by letting the element in row 7, column ; of 
these matrices be, respectively, 7, P;;(t), and Pi (0). Since the backward equations 


say that the element in row i, column j of the matrix P’(t) can be obtained by 
multiplying the ith row of the matrix R by the jth column of the matrix P(2), it 
is equivalent to the matrix equation 


P’(t) = RP(t) (6.32) 
Similarly, the forward equations can be written as 

P(t) =P@R (6.33) 
Now, just as the solution of the scalar differential equation 

f(t) = f(t) 
(or, equivalent, f’(t) = f(£)c) is 

f(t) = fO)e* 


it can be shown that the solution of the matrix differential Equations (6.32) and 
(6.33) is given by 


P(t) = P(O)e®* 
Since P(0) =I (the identity matrix), this yields that 
P(t) = eB (6.34) 


where the matrix e® is defined by 
Ce t” 
eR — poe ee. (6.35) 
n=0 


with R” being the (matrix) multiplication of R by itself 1 times. 

The direct use of Equation (6.35) to compute P(t) turns out to be very ineffi- 
cient for two reasons. First, since the matrix R contains both positive and negative 
elements (remember the off-diagonal elements are the qj while the ith diagonal 
element is —v;), there is the problem of computer round-off error when we com- 
pute the powers of R. Second, we often have to compute many of the terms in the 
infinite sum (6.35) to arrive at a good approximation. However, there are certain 
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indirect ways that we can utilize the relation in (6.34) to efficiently approximate 
the matrix P(t). We now present two of these methods. 


Approximation Method 1 Rather than using Equation (6.35) to compute e®’, we 
can use the matrix equivalent of the identity 


e~ = lim (1 + “)" 


n> oo 


which states that 


t n 
et — lim (1 fs Rn“) 
> OO nN 


Thus, if we let 7 be a power of 2, say, n = 2*, then we can approximate P(t) by 
raising the matrix M =I + Rt/m to the nth power, which can be accomplished 
by k matrix multiplications (by first multiplying M by itself to obtain M* and 
then multiplying that by itself to obtain M* and so on). In addition, since only 
the diagonal elements of R are negative (and the diagonal elements of the identity 
matrix I are equal to 1), by choosing n large enough we can guarantee that the 
matrix I + Rt/v has all nonnegative elements. 


t 


Approximation Method 2 A second approach to approximating e®’ uses the 


identity 


t n 
7 (1 - R*) for 1 large 


Hence, if we again choose 7 to be a large power of 2, say, n = 2%, we can 
approximate P(t) by first computing the inverse of the matrix I— Rt/ and then 
raising that matrix to the nth power (by utilizing & matrix multiplications). It can 
be shown that the matrix (I— Rt/)~! will have only nonnegative elements. 


Remark Both of the preceding computational approaches for approximating 
P(t) have probabilistic interpretations (see Exercises 41 and 42). 
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Exercises 


A population of organisms consists of both male and female members. In a small 
colony any particular male is likely to mate with any particular female in any time 
interval of length h, with probability Ah + o(h). Each mating immediately produces 
one offspring, equally likely to be male or female. Let N;(t) and N2(t) denote the 
number of males and females in the population at t. Derive the parameters of the 
continuous-time Markov chain {Nj (t), N2(d)}, i.e., the vj, Pj; of Section 6.2. 


Suppose that a one-celled organism can be in one of two states—either A or B. An 
individual in state A will change to state B at an exponential rate a; an individual in 
state B divides into two new individuals of type A at an exponential rate £. Define 
an appropriate continuous-time Markov chain for a population of such organisms 
and determine the appropriate parameters for this model. 


Consider two machines that are maintained by a single repairman. Machine i func- 
tions for an exponential time with rate 4; before breaking down, i = 1,2. The 
repair times (for either machine) are exponential with rate jz. Can we analyze this 
as a birth and death process? If so, what are the parameters? If not, how can we 
analyze it? 


Potential customers arrive at a single-server station in accordance with a Poisson 
process with rate A. However, if the arrival finds 7 customers already in the station, 
then he will enter the system with probability a,,. Assuming an exponential service 
rate j1, set this up as a birth and death process and determine the birth and death 
rates. 

There are N individuals in a population, some of whom have a certain infection 
that spreads as follows. Contacts between two members of this population occur 
in accordance with a Poisson process having rate A. When a contact occurs, it is 
equally likely to involve any of the (41) pairs of individuals in the population. If a 
contact involves an infected and a noninfected individual, then with probability p 
the noninfected individual becomes infected. Once infected, an individual remains 
infected throughout. Let X(t) denote the number of infected members of the pop- 
ulation at time f. 


(a) Is {X(¢),t > 0} a continuous-time Markov chain? 
(b) Specify its type. 
(c) Starting with a single infected individual, what is the expected time until all 


members are infected? 


Consider a birth and death process with birth rates 4; = (i + 1)A,i > 0, and death 
rates Wj = ip,i > 0. 

(a) Determine the expected time to go from state 0 to state 4. 

(b) Determine the expected time to go from state 2 to state 5. 

(c) Determine the variances in parts (a) and (b). 

Individuals join a club in accordance with a Poisson process with rate 4. Each new 


member must pass through k consecutive stages to become a full member of the club. 
The time it takes to pass through each stage is exponentially distributed with rate 
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10. 


*11. 


12. 


je. Let Nj(t) denote the number of club members at time t who have passed through 

exactly i stages, i=1,...,k — 1. Also, let N(f) = (N1(t), No(0),..., Np_1 @). 

(a) Is {N(t¢),t > 0} a continuous-time Markov chain? 

(b) If so, give the infinitesimal transition rates. That is, for any state n = 
(n1,...,Mp_1) give the possible next states along with their infinitesimal rates. 


Consider two machines, both of which have an exponential lifetime with mean 1/A. 
There is a single repairman that can service machines at an exponential rate jx. Set 
up the Kolmogorov backward equations; you need not solve them. 


The birth and death process with parameters 4,, = 0 and uw, = u,n > 0 is called a 
pure death process. Find P;(t). 


Consider two machines. Machine i operates for an exponential time with rate A; 
and then fails; its repair time is exponential with rate j,i = 1,2. The machines act 
independently of each other. Define a four-state continuous-time Markov chain that 
jointly describes the condition of the two machines. Use the assumed independence 
to compute the transition probabilities for this chain and then verify that these 
transition probabilities satisfy the forward and backward equations. 


Consider a Yule process starting with a single individual—that is, suppose X(0) = 1. 

Let T; denote the time it takes the process to go from a population of size i to one 

of sizei+ 1. 

(a) Argue that T;,i = 1,...,7, are independent exponentials with respective rates 
ir. 

(b) Let X1,..., Xj denote independent exponential random variables each hav- 
ing rate 4, and interpret X; as the lifetime of component i. Argue that 
max(X1,...,X;) can be expressed as 


max(X1,...,Xj) =e, t+e2 +--+ +6 


where €1,€2,...,€; are independent exponentials with respective rates jA, 
Ge, es 


Hint: Interpret ¢; as the time between the i — 1 and the ith failure. 


(c) Using (a) and (b) argue that 


PIM +--+G<Hg=a-e%y 


(d) Use (c) to obtain 


Pij(t) = al _ etyi-l _ ‘al 6 at =e zal =e Ady 1 
and hence, given X(0) = 1, X(¢) has a geometric distribution with parameter 


pee, 
(e) Now conclude that 


;_ 4 a 
Pit) = @ = ) gag ys 


Each individual in a biological population is assumed to give birth at an exponential 
rate A, and to die at an exponential rate jz. In addition, there is an exponential rate 
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14. 


1S. 


*16. 


17. 


18. 


of increase 6 due to immigration. However, immigration is not allowed when the 

population size is N or larger. 

(a) Set this up as a birth and death model. 

(b) IfN =3,1=6=A, uw = 2, determine the proportion of time that immigration 
is restricted. 


A small barbershop, operated by a single barber, has room for at most two cus- 
tomers. Potential customers arrive at a Poisson rate of three per hour, and the 
successive service times are independent exponential random variables with mean 
7 hour. 

(a) What is the average number of customers in the shop? 

(b) What is the proportion of potential customers that enter the shop? 

(c) Ifthe barber could work twice as fast, how much more business would he do? 


Potential customers arrive at a full-service, one-pump gas station at a Poisson rate of 
20 cars per hour. However, customers will only enter the station for gas if there are 
no more than two cars (including the one currently being attended to) at the pump. 
Suppose the amount of time required to service a car is exponentially distributed 
with a mean of five minutes. 

(a) What fraction of the attendant’s time will be spent servicing cars? 

(b) What fraction of potential customers are lost? 


A service center consists of two servers, each working at an exponential rate of 

two services per hour. If customers arrive at a Poisson rate of three per hour, then, 

assuming a system capacity of at most three customers, 

(a) what fraction of potential customers enter the system? 

(b) what would the value of part (a) be if there was only a single server, and his 
rate was twice as fast (that is, = 4)? 


The following problem arises in molecular biology. The surface of a bacterium 
consists of several sites at which foreign molecules—some acceptable and some 
not—become attached. We consider a particular site and assume that molecules 
arrive at the site according to a Poisson process with parameter 4. Among these 
molecules a proportion @ is acceptable. Unacceptable molecules stay at the site for 
a length of time that is exponentially distributed with parameter 41, whereas an 
acceptable molecule remains at the site for an exponential time with rate 2. An 
arriving molecule will become attached only if the site is free of other molecules. 
What percentage of time is the site occupied with an acceptable (unacceptable) 
molecule? 


Each time a machine is repaired it remains up for an exponentially distributed time 
with rate A. It then fails, and its failure is either of two types. If it is a type 1 failure, 
then the time to repair the machine is exponential with rate 1; if it is a type 2 failure, 
then the repair time is exponential with rate 42. Each failure is, independently of 
the time it took the machine to fail, a type 1 failure with probability p and a type 2 
failure with probability 1—p. What proportion of time is the machine down due to 
a type 1 failure? What proportion of time is it down due to a type 2 failure? What 
proportion of time is it up? 

After being repaired, a machine functions for an exponential time with rate 4 and 
then fails. Upon failure, a repair process begins. The repair process proceeds sequen- 
tially through k distinct phases. First a phase 1 repair must be performed, then a 
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22. 


23% 


*24. 


25. 


phase 2, and so on. The times to complete these phases are independent, with phase 
i taking an exponential time with rate ;,i=1,...,k. 

(a) What proportion of time is the machine undergoing a phase i repair? 

(b) What proportion of time is the machine working? 


A single repairperson looks after both machines 1 and 2. Each time it is repaired, 
machine i stays up for an exponential time with rate 4;, i = 1, 2. When machine 
i fails, it requires an exponentially distributed amount of work with rate 1; to 
complete its repair. The repairperson will always service machine 1 when it is down. 
For instance, if machine 1 fails while 2 is being repaired, then the repairperson will 
immediately stop work on machine 2 and start on 1. What proportion of time is 
machine 2 down? 


There are two machines, one of which is used as a spare. A working machine will 
function for an exponential time with rate 4 and will then fail. Upon failure, it is 
immediately replaced by the other machine if that one is in working order, and it 
goes to the repair facility. The repair facility consists of a single person who takes 
an exponential time with rate jx to repair a failed machine. At the repair facility, the 
newly failed machine enters service if the repairperson is free. If the repairperson 
is busy, it waits until the other machine is fixed; at that time, the newly repaired 
machine is put in service and repair begins on the other one. Starting with both 
machines in working condition, find 

(a) the expected value and 

(b) the variance of the time until both are in the repair facility. 

(c) In the long run, what proportion of time is there a working machine? 


Suppose that when both machines are down in Exercise 20 a second repairperson is 
called in to work on the newly failed one. Suppose all repair times remain exponen- 
tial with rate 4. Now find the proportion of time at least one machine is working, 
and compare your answer with the one obtained in Exercise 20. 


Customers arrive at a single-server queue in accordance with a Poisson process 
having rate A. However, an arrival that finds 7 customers already in the system will 
only join the system with probability 1/( + 1). That is, with probability m/(# + 1) 
such an arrival will not join the system. Show that the limiting distribution of the 
number of customers in the system is Poisson with mean A/j. 


A job shop consists of three machines and two repairmen. The amount of time a 
machine works before breaking down is exponentially distributed with mean 10. 
If the amount of time it takes a single repairman to fix a machine is exponentially 
distributed with mean 8, then 

(a) what is the average number of machines not in use? 

(b) what proportion of time are both repairmen busy? 


Consider a taxi station where taxis and customers arrive in accordance with Poisson 
processes with respective rates of one and two per minute. A taxi will wait no matter 
how many other taxis are present. However, an arriving customer that does not 
find a taxi waiting leaves. Find 

(a) the average number of taxis waiting, and 

(b) the proportion of arriving customers that get taxis. 


Customers arrive at a service station, manned by a single server who serves at an 
exponential rate 41, at a Poisson rate 4. After completion of service the customer 
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then joins a second system where the server serves at an exponential rate 2. Such 
a system is called a tandem or sequential queueing system. Assuming that A < jj, 
i = 1, 2, determine the limiting probabilities. 


Hint: Try a solution of the form Py, = Ca" pf’, and determine C, a, B. 


Consider an ergodic M/M/s queue in steady state (that is, after a long time) and 
argue that the number presently in the system is independent of the sequence of past 
departure times. That is, for instance, knowing that there have been departures 2, 
3, 5, and 10 time units ago does not affect the distribution of the number presently 
in the system. 


In the M/M/s queue if you allow the service rate to depend on the number in the 
system (but in such a way so that it is ergodic), what can you say about the output 
process? What can you say when the service rate 4 remains unchanged but A > sj? 


If {X(2)} and { Y(4)} are independent continuous-time Markov chains, both of which 
are time reversible, show that the process {X(t), Y(t)} is also a time reversible 
Markov chain. 


Consider a set of 2 machines and a single repair facility to service these machines. 

Suppose that when machine i, i = 1,...,7, fails it requires an exponentially dis- 

tributed amount of work with rate 4; to repair it. The repair facility divides its 

efforts equally among all failed machines in the sense that whenever there are k 

failed machines each one receives work at a rate of 1/k per unit time. If there are 

a total of r working machines, including machine i, then i fails at an instantaneous 

rate i;/r. 

(a) Define an appropriate state space so as to be able to analyze the preceding 
system as a continuous-time Markov chain. 

(b) Give the instantaneous transition rates (that is, give the qjj). 

(c) Write the time reversibility equations. 

(d) Find the limiting probabilities and show that the process is time reversible. 


Consider a graph with nodes 1,2,...,7 and the (5) arcs (i,/),i #j,i,j,= 1,...,7. 
(See Section 3.6.2 for appropriate definitions.) Suppose that a particle moves along 
this graph as follows: Events occur along the arcs (i,/) according to independent 
Poisson processes with rates ,;. An event along arc (é,/) causes that arc to become 
excited. If the particle is at node i at the moment that (i,j) becomes excited, it 
instantaneously moves to node j, i,j = 1,...,”. Let P; denote the proportion of 
time that the particle is at node j. Show that 


Hint: Use time reversibility. 


A total of N customers move about among ¢ servers in the following manner. When 
a customer is served by server i, he then goes over to server j, j 4 i, with probability 
1/(r—1). If the server he goes to is free, then the customer enters service; otherwise 
he joins the queue. The service times are all independent, with the service times 
at server i being exponential with rate w, i = 1,...,7. Let the state at any time 
be the vector (71,...,7,), where 1; is the number of customers presently at server 
i, i=1,...,7, oni =N. 
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(a) Argue that if X(t) is the state at time t, then {X(t), t > 0} is a continuous-time 
Markov chain. 

(b) Give the infinitesimal rates of this chain. 

(c) Show that this chain is time reversible, and find the limiting probabilities. 


Customers arrive at a two-server station in accordance with a Poisson process hav- 
ing rate 4. Upon arriving, they join a single queue. Whenever a server completes 
a service, the person first in line enters service. The service times of server i are 
exponential with rate wj, i = 1,2, where wy + 2 > A. An arrival finding both 
servers free is equally likely to go to either one. Define an appropriate continuous- 
time Markov chain for this model, show it is time reversible, and find the limiting 
probabilities. 


Consider two M/M/1 queues with respective parameters Aj, (4;, i = 1,2. Suppose 
they share a common waiting room that can hold at most three customers. That is, 
whenever an arrival finds her server busy and three customers in the waiting room, 
she goes away. Find the limiting probability that there will be 2 queue 1 customers 
and m queue 2 customers in the system. 


Hint: Use the results of Exercise 28 together with the concept of truncation. 


Four workers share an office that contains four telephones. At any time, each worker 
is either “working” or “on the phone.” Each “working” period of worker i lasts 
for an exponentially distributed time with rate A;, and each “on the phone” period 
lasts for an exponentially distributed time with rate uj, i = 1, 2, 3, 4. 
(a) What proportion of time are all workers “working”? 
Let X;(t) equal 1 if worker i is working at time f, and let it be 0 otherwise. 
Let X(t) = (X1(£), X2(t), X30), X4(0)). 
(b) Argue that {X(¢), t > 0} is a continuous-time Markov chain and give its 
infinitesimal rates. 
(c) Is {X(£)} time reversible? Why or why not? 
Suppose now that one of the phones has broken down. Suppose that a worker who 
is about to use a phone but finds them all being used begins a new “working” 
period. 
(d) What proportion of time are all workers “working”? 


Consider a time reversible continuous-time Markov chain having infinitesimal tran- 
sition rates qj; and limiting probabilities {P;}. Let A denote a set of states for this 
chain, and consider a new continuous-time Markov chain with transition rates qj, 


given by 


« _ | ijs ific A, j¢A 

dij» otherwise 
where c is an arbitrary positive number. Show that this chain remains time 
reversible, and find its limiting probabilities. 


Consider a system of 1 components such that the working times of component 
i, i=1,...,, are exponentially distributed with rate 1;. When a component fails, 
however, the repair rate of component i depends on how many other components 
are down. Specifically, suppose that the instantaneous repair rate of component 
i, i=1,...,n, when there are a total of k failed components, is o* 1;. 
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(a) Explain how we can analyze the preceding as a continuous-time Markov chain. 
Define the states and give the parameters of the chain. 

(b) Show that, in steady state, the chain is time reversible and compute the limiting 
probabilities. 


For the continuous-time Markov chain of Exercise 3 present a uniformized version. 


In Example 6.20, we computed m(t) = E[O(t)], the expected occupation time in 
state 0 by time ¢ for the two-state continuous-time Markov chain starting in state 
0. Another way of obtaining this quantity is by deriving a differential equation 
for it. 

(a) Show that 


mt + h) = m(t) + Poo(t)h + oth) 
(b) Show that 


m'(t) = te, ab 4 o-Gtp)t 
Atm Atm 

(c) Solve for m(t). 
Let O(t) be the occupation time for state 0 in the two-state continuous-time Markov 
chain. Find E[O(¢)|X(0) = 1]. 
Consider the two-state continuous-time Markov chain. Starting in state 0, find 
Cov[X(s), X(t)]. 
Let Y denote an exponential random variable with rate A that is independent of the 
continuous-time Markov chain {X(t)} and let 


P; = P{X(Y) = |X) =i} 
(a) Show that 


: 1 : a 
Pj = ePpj + ——~Sij 
yen Di a ea! 


where 6; is 1 when i = j and 0 when i # j. 
(b) Show that the solution of the preceding set of equations is given by 


P=(1-—R/a)7! 


where P is the matrix of elements Pes Lis the identity matrix, and R the matrix 
specified in Section 6.8. 

(c) Suppose now that Yj,..., Y, are independent exponentials with rate A that 
are independent of {X(t)}. Show that 


P{X(Y +--+ + Yu) = 7|X(0) = 3} 


is equal to the element in row i, column j of the matrix P”. 
(d) Explain the relationship of the preceding to Approximation 2 of Section 6.8. 
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*42. (a) Show that Approximation 1 of Section 6.8 is equivalent to uniformizing the 
continuous-time Markov chain with a value v such that vt = n and then 
approximating Pj(¢) by Pi. 

(b) Explain why the preceding should make a good approximation. 


Hint: What is the standard deviation of a Poisson random variable with mean n? 
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Renewal Theory and = |! 
Its Applications 
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7.1. Introduction 


We have seen that a Poisson process is a counting process for which the times 
between successive events are independent and identically distributed exponential 
random variables. One possible generalization is to consider a counting process 
for which the times between successive events are independent and identically 
distributed with an arbitrary distribution. Such a counting process is called a 
renewal process. 

Let {N(t), t > O} be a counting process and let X,, denote the time between 
the (7 — 1)st and the mth event of this process, 1 > 1. 


Definition 7.1 If the sequence of nonnegative random variables {X1, X2,...} is 
independent and identically distributed, then the counting process {N(t), t > 0} 
is said to be a renewal process. 


Thus, a renewal process is a counting process such that the time until the first 
event occurs has some distribution F, the time between the first and second event 
has, independently of the time of the first event, the same distribution F, and 
so on. When an event occurs, we say that a renewal has taken place. 

For an example of a renewal process, suppose that we have an infinite supply of 
lightbulbs whose lifetimes are independent and identically distributed. Suppose 
also that we use a single lightbulb at a time, and when it fails we immediately 
replace it with a new one. Under these conditions, {N(t), t > 0} is a renewal 
process when N(f) represents the number of lightbulbs that have failed by time t. 
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Figure 7.1 Renewal and interarrival times. 


For a renewal process having interarrival times X1, X2,..., let 
n 
Soa0, Se A 
i=1 


That is, S$; = Xj is the time of the first renewal; Sy) = X1 + X2 is the time until 
the first renewal plus the time between the first and second renewal, that is, S> is 
the time of the second renewal. In general, S,, denotes the time of the nth renewal 
(see Figure 7.1). 

We shall let F denote the interarrival distribution and to avoid trivialities, we 
assume that F(0) = P{X, = 0} < 1. Furthermore, we let 


be the mean time between successive renewals. It follows from the nonnegativity 
of X, and the fact that X,, is not identically 0 that 4 > 0. 

The first question we shall attempt to answer is whether an infinite number of 
renewals can occur in a finite amount of time. That is, can N(¢) be infinite for 
some (finite) value of t? To show that this cannot occur, we first note that, as S, 
is the time of the nth renewal, N(t) may be written as 


N(t) = max{n: S, < t} (7.1) 


To understand why Equation (7.1) is valid, suppose, for instance, that $4 <t but 
Ss > t. Hence, the fourth renewal had occurred by time ¢ but the fifth renewal 
occurred after time ¢; or in other words, N(t), the number of renewals that 
occurred by time t, must equal 4. Now by the strong law of large numbers it 
follows that, with probability 1, 


Sn 


——> pe an->o 
n 


But since 4 > 0 this means that S,, must be going to infinity as 1 goes to infinity. 
Thus, S,, can be less than or equal to ¢ for at most a finite number of values of n, 
and hence by Equation (7.1), N(¢) must be finite. 

However, though N(t) < co for each ¢, it is true that, with probability 1, 


N(co) = jim N@ = co 
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This follows since the only way in which N(oo), the total number of renewals 
that occur, can be finite is for one of the interarrival times to be infinite. 
Therefore, 


P{N(oo) < 00} = P{X,, = © for some n} 


= pfx = oo} 


n=1 


CO 
< DUP (Xn = 00} 
n=1 


=0 


7.2 Distribution of N(t) 


The distribution of N(t) can be obtained, at least in theory, by first noting the 
important relationship that the number of renewals by time t is greater than or 
equal to n if and only if the nth renewal occurs before or at time t. That is, 


N@®2n © S,X<t (7.2) 
From Equation (7.2) we obtain 


P{N(t) = n} = P{N(@) 2 n} — PING) 2n + I 
= P{Sn < t}— P{Sn41 < th (7.3) 


Now, since the random variables X;, i > 1, are independent and have a com- 
mon distribution F, it follows that S, = }7/_, X; is distributed as F,,, the 1-fold 
convolution of F with itself (Section 2.5). Therefore, from Equation (7.3) we 
obtain 


PIN@) = 2} = Fa® — Fri ® 


Example 7.1 Suppose that P{X, = i} = p(1—p)’~!,i > 1. That is, suppose that 
the interarrival distribution is geometric. Now S$, = X1 may be interpreted as the 
number of trials necessary to get a single success when each trial is independent 
and has a probability p of being a success. Similarly, S,, may be interpreted as 
the number of trials necessary to attain 1 successes, and hence has the negative 
binomial distribution 


6 
0, k <n 


7 1)e"a —pyr", kon 
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Thus, from Equation (7.3) we have that 


Ly ea 
P{N(@) =n} = >> é . ie" — pe 


[t] 
Rod n+1 k-n-1 
-> ( ‘ p (1 =p) 


Equivalently, since an event independently occurs with probability p at each of 
the times 1, 2,... 


[t] 


nN 


P{N(t) =n} = ( era — py a 


Another expression for P(N(t) = 7) can be obtained by conditioning on Sy. 
This yields 


P(N(t) =2) = [ P(N() = 1lSn = 9) fs,(9)4y 


Now, if the mth event occurred at time y > t, then there would have been less 
than n events by time ¢. On the other hand, if it occurred at a time y < t, then 
there would be exactly 1 events by time t¢ if the next interarrival exceeds t — y. 
Consequently, 


t 
P(N(t) =n) = [ P(Xny1 > t—y1Sn = y) fs, (dy 


t nm 
= if Fa — fs, (y)dy 


where F = 1 — F. 


Example 7.2 If F(x) = 1 — e** then S,, being the sum of 1 independent expo- 
nentials with rate 4, will have a gamma (n, A) distribution. Consequently, the 
preceding identity gives 


n—1 


he*Y (Ay) 
(n—1)! 

Ne At t “cy 

“Ge ni, Vee 


i: n 
a oat & t) 
n\ 


t 
P(N(t) =n) = / eG) dy 
0 
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By using Equation (7.2) we can calculate m/(t), the mean value of N(t), as 


m(t) = E[N(¢)] 


where we have used the fact that if X is nonnegative and integer valued, then 


oo co Uk 
E[X] = Y | RP{X =k}= pe Rese <— k} 
k=1 k=1n=1 
aS) PKS h=) Pix Sa 
n=1k=n n=1 


The function m(t) is known as the mean-value or the renewal function. 

It can be shown that the mean-value function m(t) uniquely determines the 
renewal process. Specifically, there is a one-to-one correspondence between the 
interarrival distributions F and the mean-value functions m(t). 

Another interesting result that we state without proof is that 


m(t) < co forallt <o 


Remarks 


(i) Since m(t) uniquely determines the interarrival distribution, it follows that the Poisson 
process is the only renewal process having a linear mean-value function. 

(ii) Some readers might think that the finiteness of 7(t) should follow directly from the 
fact that, with probability 1, N(¢) is finite. However, such reasoning is not valid; 
consider the following: Let Y be a random variable having the following probability 
distribution: 


Y = 2” with probability (5)", > 1 


Now, 


(oe) 


PY Ao} = SPY S24 = (Gg) Sd 
n=1 


n=1 
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But 
EIS VY SIs y 245 S06 
n=1 n=1 


Hence, even when Y is finite, it can still be true that E[Y] = oo. 


An integral equation satisfied by the renewal function can be obtained by condi- 
tioning on the time of the first renewal. Assuming that the interarrival distribution 
F is continuous with density function f this yields 


m(t) = E[N(t)] = [ E[N(@)|X1 = xIf (x) dex (7.4) 


Now suppose that the first renewal occurs at a time x that is less than t. Then, 

using the fact that a renewal process probabilistically starts over when a renewal 

occurs, it follows that the number of renewals by time t would have the same dis- 

tribution as 1 plus the number of renewals in the first t — x time units. Therefore, 
E[N@)|X1 =x] =1+E[N(t—~x)] ifx<t 

Since, clearly 


E[N(t)|X; =x] =0 whenx>t 


we obtain from Equation (7.4) that 
t 
m(t) = [UL + m(e— sf de 
0 
t 
= F(t) + i m(t — x)f (x) dx (7.5) 
0 


Equation (7.5) is called the renewal equation and can sometimes be solved to 
obtain the renewal function. 


Example 7.3 One instance in which the renewal equation can be solved is when 
the interarrival distribution is uniform—say, uniform on (0, 1). We will now 
present a solution in this case when t < 1. For such values of t, the renewal 
function becomes 


t 
moat f m(t — x) dx 
0 


t 
=t+ / m(y) dy by the substitution y = t — x 
0 
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Differentiating the preceding equation yields 
m (t) =1+m(t) 
Letting h(t) = 1 + m(t), we obtain 


h'(t) = h(t) 


logh(t)=t+C 
or 
h(t) = Ke’ 
or 
m(t) = Ke’ —1 
Since m(0) = 0, we see that K = 1, and so we obtain 


mt)=e-1, O0<t<l | 


7.3 Limit Theorems and Their Applications 


We have shown previously that, with probability 1, N(¢) goes to infinity as t goes 
to infinity. However, it would be nice to know the rate at which N(t) goes to 
infinity. That is, we would like to be able to say something about lim;-,.. N(£)/t. 

As a prelude to determining the rate at which N(£) grows, let us first consider 
the random variable Sn). In words, just what does this random variable repre- 
sent? Proceeding inductively suppose, for instance, that N(¢) = 3. Then Snq@ = $3 
represents the time of the third event. Since there are only three events that have 
occurred by time t, $3 also represents the time of the last event prior to (or at) 
time f. This is, in fact, what Sn ¢) represents—namely, the time of the last renewal 
prior to or at time t. Similar reasoning leads to the conclusion that Sn41 repre- 
sents the time of the first renewal after time t (see Figure 7.2). We now are ready 
to prove the following. 


Time 


Figure 7.2 
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Proposition 7.1 With probability 1, 


N(t) 1 
> ast—> & 
t bh 


Proof. Since Sn) is the time of the last renewal prior to or at time ft, and Sn@+1 
is the time of the first renewal after time ¢, we have 


Sna <t < Sna+1 


or 


SN) Pam SN()+1 


ae (7.6) 
N(t) ~ N(t) N(é) 


However, since Sniy/N(t) = ee X;/N(t) is the average of N(t) independent 


and identically distributed random variables, it follows by the strong law of large 
numbers that Sny)/N(t) > as N(t) > oo. But since N(t) > oo when t > ov, 
we obtain 


SN«) 
N(t) 


>p ast>o 


Furthermore, writing 
SN()+1 = SN@)+1 N(t) +1 
N@  \N@+1 N(t) 


we have that Sniy41/(N(t) + 1) > w by the same reasoning as before and 


Ct >1 ast-oo 
N() 

Hence, 

S 

nae >p at>ow 
The result now follows by Equation (7.6) since t/N(t) is between two random 
variables, each of which converges to «4 as t > oo. | 
Remarks 


(i) The preceding propositions are true even when jz, the mean time between renewals, 
is infinite. In this case, we interpret 1/y to be 0. 
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(ii) The number 1/, is called the rate of the renewal process. 
(iii) Because the average time between renewals is 1, it is quite intuitive that the average 
rate at which renewals occur is 1 per every jz time units. a 


Example 7.4 Beverly has a radio that works on a single battery. As soon as the 
battery in use fails, Beverly immediately replaces it with a new battery. If the 
lifetime of a battery (in hours) is distributed uniformly over the interval (30, 60), 
then at what rate does Beverly have to change batteries? 


Solution: If we let N(¢) denote the number of batteries that have failed by time 
t, we have by Proposition 7.1 that the rate at which Beverly replaces batteries 
is given by 


_ N@ 1 
lim —— = — 
t>oo ft pw AS 


That is, in the long run, Beverly will have to replace one battery every 
45 hours. a 


Example 7.5 Suppose in Example 7.4 that Beverly does not keep any surplus 
batteries on hand, and so each time a failure occurs she must go and buy a new 
battery. If the amount of time it takes for her to get a new battery is uniformly dis- 
tributed over (0, 1), then what is the average rate that Beverly changes batteries? 


Solution: In this case the mean time between renewals is given by 
w= E[U,] + E[U2] 

where U, is uniform over (30, 60) and U> is uniform over (0, 1). Hence, 
w=45 +5 =455 


and so in the long run, Beverly will be putting in a new battery at the rate of 
a That is, she will put in two new batteries every 91 hours. a 


Example 7.6 Suppose that potential customers arrive at a single-server bank 
in accordance with a Poisson process having rate 4. However, suppose that the 
potential customer will enter the bank only if the server is free when he arrives. 
That is, if there is already a customer in the bank, then our arriver, rather than 
entering the bank, will go home. If we assume that the amount of time spent in 
the bank by an entering customer is a random variable having distribution G, 
then 


(a) what is the rate at which customers enter the bank? 
(b) what proportion of potential customers actually enter the bank? 


Solution: In answering these questions, let us suppose that at time 0 a customer 
has just entered the bank. (That is, we define the process to start when the first 


430 Renewal Theory and Its Applications 


customer enters the bank.) If we let wg denote the mean service time, then, by 
the memoryless property of the Poisson process, it follows that the mean time 
between entering customers is 


w=nGt > 


Hence, the rate at which customers enter the bank will be given by 


ae 
mw 1+dAug 


On the other hand, since potential customers will be arriving at a rate A, it 
follows that the proportion of them entering the bank will be given by 


a/A +AuG) il 
N ~ 1+4+dApuc 


In particular if A = 2 and wg = 2, then only one customer out of five will 
actually enter the system. a 


A somewhat unusual application of Proposition 7.1 is provided by our next 
example. 


Example 7.7. A sequence of independent trials, each of which results in outcome 
number i with probability P;, i = 1,...,”, )°/_, P; = 1, is observed until the 
same outcome occurs k times in a row; this outcome then is declared to be the 
winner of the game. For instance, if k = 2 and the sequence of outcomes is 
1,2, 4, 3, 5, 2,1, 3, 3, then we stop after nine trials and declare outcome number 3 
the winner. What is the probability that i wins, i = 1,..., , and what is the 
expected number of trials? 


Solution: We begin by computing the expected number of coin tosses, call it 
E[T], until a run of k successive heads occurs when the tosses are independent 
and each lands on heads with probability p. By conditioning on the time of the 
first nonhead, we obtain 


k 
E(T] = )°(1— pp" + ELT) + kp* 
j=l 


Solving this for E[T] yields 
d-p., 
Oe a dees: ie 


j=l 
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Upon simplifying, we obtain 


1t+p+---+pe! 
pk 
a oes 
pk(1 — p) 
Now, let us return to our example, and let us suppose that as soon as the 
winner of a game has been determined we immediately begin playing another 
game. For each i let us determine the rate at which outcome i wins. Now, every 


time i wins, everything starts over again and thus wins by i constitute renewals. 
Hence, from Proposition 7.1, the 


E[T] = 


(7.7) 


rate at which 7 wins = 


E[N;j] 


where N; denotes the number of trials played between successive wins of out- 
come i. Hence, from Equation (7.7) we see that 
Pi(1 — Pi) 


rate at which 7 wins = ; 
1—P 


(7.8) 


Hence, the long-run proportion of games that are won by number / is given by 


; a rate at which / wins 
proportion of games i wins = 


dija1 fate at which j wins 
_ PR P)/( — PF) 
D1 (Ped — Pi)/( — PA) 


However, it follows from the strong law of large numbers that the long-run 
proportion of games that i wins will, with probability 1, be equal to the prob- 
ability that 7 wins any given game. Hence, 

PdSP)/G=P) 
De (PR — Pi)/ — PP) 


P{i wins} = 


To compute the expected time of a game, we first note that the 


n 
rate at which games end = ys rate at which 7 wins 
i=1 
aye past (from Equation (7.8)) 
rom Equation (7. 
jis ~1-PE- ; 
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Now, as everything starts over when a game ends, it follows by Proposition 7.1 
that the rate at which games end is equal to the reciprocal of the mean time of 
a game. Hence, 


1 
rate at which games end 


yk — P)/d — PP) 


E[time of a game} = 


Proposition 7.1 says that the average renewal rate up to time t¢ will, with prob- 
ability 1, converge to 1/4 as t > oo. What about the expected average renewal 
rate? Is it true that m/(t)/t also converges to 1/4? This result is known as the 
elementary renewal theorem. 


Theorem 7.1 Elementary Renewal Theorem 


m(t) 1 
> as t > © 
t Bh 


As before, 1/ is interpreted as 0 when pw = oo. 


Remark At first glance it might seem that the elementary renewal theorem should 
be a simple consequence of Proposition 7.1. That is, since the average renewal 
rate will, with probability 1, converge to 1/y, should this not imply that the 
expected average renewal rate also converges to 1/2? We must, however, be 
careful; consider the next example. 


Example 7.8 Let U be a random variable which is uniformly distributed on (0, 
1); and define the random variables Y,, > 1, by 


0, ifU>1/n 


v= 
n, ifU<1/n 


Now, since, with probability 1, U will be greater than 0, it follows that Y,, will 
equal 0 for all sufficiently large n. That is, Y, will equal 0 for all 7 large enough 
so that 1/n < U. Hence, with probability 1, 


Y,—>0O asn—->oo 


However, 
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Therefore, even though the sequence of random variables Y,, converges to 0, the 
expected values of the Y,, are all identically 1. | 


A key element in the proof of the elementary renewal theorem, which is also 
of independent interest, is the establishment of a relationship between m/(t), the 
mean number of renewals by time ¢, and E[Sn+1], the expected time of the first 
renewal after t. Letting 


g(t) = E[Sn41] 


we will derive an integral equation, similar to the renewal equation, for g(t) by 
conditioning on the time of the first renewal. This yields 


sy = [ HSA Sai ree 


where we have supposed that the interarrival times are continuous with density f. 
Now, if the first renewal occurs at time x and x > t, then clearly the time of the 
first renewal after t is x. On the other hand, if the first renewal occurs at a time 
x < t, then by regarding x as the new origin, it follows that the expected time, 
from this origin, of the first renewal occurring after a time ¢ — x from this origin 
is g(t — x). That is, we see that 


g(t—x)+x, ifx<t 
E X,=xl= 
[Sny411X1 = x] ie ag 


Substituting this into the preceding equation gives 
t [oe 
gt) = [ (g(t — x) + x) f(x) dx + / xf (x) dx 
t 


t [o,@) 
=) st—a) foo de + f xf (x) dx 
0 0 


t 
ei =n+ | ee —x flax 
which is quite similar to the renewal equation 
t 
m(t) = F(t) + i m(t — x)f (x) ds 
0 
Indeed, if we let 
le 
iu 


git) 1 
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we see that 

gift) +1=1+ [ise x) + 1]f (x) dx 
or 

gi(t) = F(t) + [ise — x)f (x) dx 


That is, gi(t) = E[Snq+1]/u — 1 satisfies the renewal equation and thus, by 
uniqueness, must be equal to m(t). We have thus proven the following. 


Proposition 7.2 
E[Snq@+1] = elm) + 1] 


A second derivation of Proposition 7.2 is given in Exercises 13 and 14. To see 
how Proposition 7.2 can be used to establish the elementary renewal theorem, 
let Y(t) denote the time from ¢ until the next renewal. Y(f) is called the excess, 
or residual life, at t. As the first renewal after ¢ will occur at time t + Y(t), we 
see that 


Sni1 =t+ YO 
Taking expectations and utilizing Proposition 7.2 yields 

ulm(t) + 1] =t+ ELYO] (7.9) 
which implies that 


mt) 1 1 Fe E[Y@] 
ee ee ti 


The elementary renewal theorem can now be proven by showing that 


ELY 
lim EY @) =0 
too t 
(see Exercise 14). 
Relation (7.9) shows that if we can determine E[Y(t)], the mean excess at f, 
then we can compute m(t) and vice versa. 


Example 7.9 Consider the renewal process whose interarrival distribution is the 
convolution of two exponentials; that is, 


F=F,*F), where F(t) =1—e%", i=1,2 
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We will determine the renewal function by first determining E[Y(t)]. To obtain 
the mean excess at t, imagine that each renewal corresponds to a new machine 
being put in use, and suppose that each machine has two components—initially 
component 1 is employed and this lasts an exponential time with rate 41, and then 
component 2, which functions for an exponential time with rate j12, is employed. 
When component 2 fails, a new machine is put in use (that is, a renewal occurs). 
Now consider the process {X(t), t > 0} where X(t) is i if a type i component 
is in use at time f. It is easy to see that {X(t), t > 0} is a two-state continuous- 
time Markov chain, and so, using the results of Example 6.11, its transition 
probabilities are 


Pui) = HA e  HitH2yt 4 a ae 
M1 + 2 M1 + £2 


To compute the expected remaining life of the machine in use at time t, we condi- 
tion on whether it is using its first or second component: for if it is still using its 
first component, then its remaining life is 1/1 + 1/2, whereas if it is already 
using its second component, then its remaining life is 1/j22. Hence, letting p() 
denote the probability that the machine in use at time f is using its first component, 
we have 


E[Y()] = (— che =) p(t) + 1-e® 
Mi 2 M2 
ate gO 
wo. 


But, since at time 0 the first machine is utilizing its first component, it follows 
that p(t) = P11(f), and so, upon using the preceding expression of P11(t), 
we obtain 


1 1 
EYOIS=— fae (7.10) 
H2  Mitp2 Mi(i + 2) 
Now it follows from Equation (7.9) that 
t E[Y(t 
Wine ee OU! (7.11) 
u UL 


where jy, the mean interarrival time, is given in this case by 


1 di 1 pith 
Mi 2 Mipl2 


h= 


Substituting Equation (7.10) and the preceding equation into (7.11) yields, after 
simplifying, 


M1p2 M1p2 1 ee tHa)ty Ps 


mt) = 
Hi + 2 (w1 + 2)? 
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Remark Using the relationship of Equation (7.11) and results from the two-state 
continuous-time Markov chain, the renewal function can also be obtained in the 
same manner as in Example 7.9 for the interarrival distributions 


F(t) = pFi(t) + (1 — p)F2(0) 
and 
F(t) = pFy(t) + (1 — p) (Fi * F2)@) 
when F,(t) = 1 — e-“#, t > 0,1 = 1,2. | 


Suppose the interarrival times of a renewal process are all positive integer 
valued. Let 


7 1, if there is a renewal at time i 
‘10, otherwise 


and note that N(), the number of renewals by time 1, can be expressed as 
n 
N(n) = > I 
i=1 


Taking expectations of both sides of the preceding shows that 


m(n) = E[N(n)] = >» P(renewal at time 7) 
i=1 


Hence, the elementary renewal theorem yields 


>, P(renewal at time /) 1 
> < 
n E[time between renewals] 


Now, for a sequence of numbers aj, 42,... it can be shown that 


; Die i 
lma=a => lim =a 
n—> 00 n—> oo n 


Hence, if limy—+oo P(renewal at time 7) exists then that limit must equal 
1 


E[time between renewals]’ 


Example 7.10 Let X;,i > 1 be independent and identically distributed random 
variables, and set 


So = 0, Si= Xa 0 
i=1 
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The process {S,, 7 > 0} is called a random walk process. Suppose that E[X;] < 0. 
The strong law of large numbers yields 


But if S, divided by 7 is converging to a negative number, then S, must be going 
to minus infinity. Let w be the probability that the random walk is always negative 
after the initial movement. That is, 


a= P(S, < 0 for all n > 1) 


To determine a, define a counting process by saying that an event occurs at time 
nif S(n) < min(0, $1,...,S,—1). That is, an event occurs each time the random 
walk process reaches a new low. Now, if an event occurs at time 7, then the next 
event will occur k time units later if 


Xn+1 2 0, Xn+1 ae Xn42 2 0, o «> Xp shes he Xntk-1 2 0, 
Pa ek ae 0) 
Because X;,i > 1 are independent and identically distributed the preceding event 
is independent of the values of X1,...,X,, and its probability of occurrence 
does not depend on m. Consequently, the times between successive events are 
independent and identically distributed, showing that the counting process is a 
renewal process. Now, 
P(renewal at 2) = P(Sn < 0,8, <S1,S_ < S2,...,Sn < Sy-1) 
A POG. TGR ANS We a Ke SY, 
X3 +--+ + Xy <0,...,Xn < 0) 
Because X;,, Xn—1,-.-,X1 has the same joint distribution as does X1, X2,..., Xp 
it follows that the value of the preceding probability would be unchanged if X1 
were replaced by X,,; X2 were replaced by X,_1; X3 were replaced by X,_2; and 
so on. Consequently, 
P(renewal at n) = P(X, + +--+ X1 <0, Xy-1 +--+: + Xi <0, 
Xn-2 +::- + X1 < 0,X1 <0) 
= PS = 08227 = Sy eS, co Si) 


Hence, 


lim P(renewal at n) = PS; < 0 foralln> 1) =a 


438 Renewal Theory and Its Applications 


But, by the elementary renewal theorem, this implies that 
o = 1/E[T] 

where T is the mean time between renewals. That is, 
T=min{n: S, < 0} 


For instance, in the case of left skip free random walks (which are ones for which 
4 P(X; = j) = 1) we showed in Section 3.6.6 that E[T] = —1/E[Xj] when 
E[X;] < 0, showing that for skip free random walks having a negative mean, 


P(Sn < 0 for all m) = —E[Xj] 


which verifies a result previously obtained in Section 3.6.6. a 


An important limit theorem is the central limit theorem for renewal processes. 
This states that, for large t, N(t¢) is approximately normally distributed with 
mean t/j and variance to*/j>, where jx and o7 are, respectively, the mean and 
variance of the interarrival distribution. That is, we have the following theorem 
which we state without proof. 


Theorem 7.2 Central Limit Theorem for Renewal Processes 


N(t) — t/u 1 i 42/2 
lim, P| SO —— e* /* dx 
t> © /to* / > J 2 —00 

In addition, as might be expected from the central limit theorem for renewal 
processes, it can be shown that Var(N(t))/t converges to o*/. That is, it can 
be shown that 


Var(N(t)) ) 
mn —_—_ = 


li a7 /m> (7.12) 


too t 


Example 7.11 Two machines continually process an unending number of jobs. 
The time that it takes to process a job on machine 1 is a gamma random vari- 
able with parameters 1 = 4, A = 2, whereas the time that it takes to process 
a job on machine 2 is uniformly distributed between 0 and 4. Approximate the 
probability that together the two machines can process at least 90 jobs by time 
t = 100. 


Solution: If we let N;(t) denote the number of jobs that machine i can pro- 
cess by time ¢, then {Ni(t),t > O} and {No(t), t > O} are independent 
renewal processes. The interarrival distribution of the first renewal process 
is gamma with parameters 7 = 4, A = 2, and thus has mean 2 and variance 1. 
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Correspondingly, the interarrival distribution of the second renewal process is 
uniform between 0 and 4, and thus has mean 2 and variance 16/12. 

Therefore, N;(100) is approximately normal with mean 50 and variance 
100/8; and N2(100) is approximately normal with mean 50 and variance 
100/6. Hence, N;(100) + N»(100) is approximately normal with mean 100 
and variance 175/6. Thus, with © denoting the standard normal distribution 
function, we have 


N,(100) +.N5(100) —1 5-1 
P(N (100) + Na(100) > 89.5) = P| RUDE UO) E00 Ban | 


V175/6 . V175/6 


~1-0( ~10.5 ) 
JisT6 


- o( 10.5 
J175/6 
~ (1.944) 
~ 0.9741 = 


7.4 Renewal Reward Processes 


A large number of probability models are special cases of the following model. 
Consider a renewal process {N(t), t > 0} having interarrival times X,, 1 > 1, 
and suppose that each time a renewal occurs we receive a reward. We denote by 
R, the reward earned at the time of the mth renewal. We shall assume that the 
R,, 1 > 1, are independent and identically distributed. However, we do allow 
for the possibility that R, may (and usually will) depend on X,, the length of 
the nth renewal interval. If we let 


Nie) 
ROS YR 
n=1 
then R(t) represents the total reward earned by time tf. Let 
E[R] = E[Rnl], EX] = E[Xy] 
Proposition 7.3 If E[R] < oo and E[X] < 0, then 
(a) with probability 1, lim Be FE 


too ¢t ~ E[X] 


_ E[R@)] _ E[R] 
(b) > BY 
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Proof. We give the proof for (a) only. To prove this, write 


RO. ae (Ext ee) 
to t = N(t) t 


By the strong law of large numbers we obtain 


N(t 


NO E[R] ast — oo 


and by Proposition 7.1 


nutZ > s as t > © 
t E[X] 
The result thus follows. | 
Remark 


(i) If we say that a cycle is completed every time a renewal occurs, then Proposition 7.3 
states that the long-run average reward per unit time is equal to the expected reward 
earned during a cycle divided by the expected length of a cycle. For instance, in 
Example 7.6 if we suppose that the amounts that the successive customers deposit in 
the bank are independent random variables having a common distribution H, then the 
rate at which deposits accumulate—that is, lim;-.o (total deposits by the time t)/t— 
is given by 


E[deposits during a cycle] _ LH 
E[time of cycle] ~ ug t1/a 


where 4g + 1/A is the mean time of a cycle, and jzy is the mean of the distribution H. 

(ii) Although we have supposed that the reward is earned at the time of a renewal, the 
result remains valid when the reward is earned gradually throughout the renewal 
cycle. 


Example 7.12 (ACar Buying Model) The lifetime of a car is a continuous random 
variable having a distribution H and probability density 4. Mr. Brown has a policy 
that he buys a new car as soon as his old one either breaks down or reaches the 
age of T years. Suppose that a new car costs C, dollars and also that an additional 
cost of C> dollars is incurred whenever Mr. Brown’s car breaks down. Under the 
assumption that a used car has no resale value, what is Mr. Brown’s long-run 
average cost? 

If we say that a cycle is complete every time Mr. Brown gets a new car, then 
it follows from Proposition 7.3 (with costs replacing rewards) that his long-run 
average cost equals 


E[cost incurred during a cycle] 


E{length of a cycle] 
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Now letting X be the lifetime of Mr. Brown’s car during an arbitrary cycle, then 
the cost incurred during that cycle will be given by 


Ci, ifX > T 
Cy +Co, ifX <T 


so the expected cost incurred over a cycle is 
Cy P{X > T} + (Cy + C2) P{X < T} = Cy + C2H(T) 
Also, the length of the cycle is 


X, f#X<T 
T, ifX>T 


and so the expected length of a cycle is 
T lore) Ts 
i, xh(x) dx + i Th(x) dx = i: xh(x) dx + T[1 — H(T)] 
0 T 0 


Therefore, Mr. Brown’s long-run average cost will be 
Cy + C,H(T) 


7.13 
Jo. xh(x) dx + TH — H(T)| ee 


Now, suppose that the lifetime of a car (in years) is uniformly distributed over 
(0, 10), and suppose that Cy is 3 (thousand) dollars and C) is 5 (thousand) 
dollars. What value of T minimizes Mr. Brown’s long-run average cost? 

If Mr. Brown uses the value T, T < 10, then from Equation (7.13) his long-run 
average cost equals 


3+ 4(T/10) 5 3+ 7/20 
J (/10) dx +TA-—T/10) T?/20 + GOT — T?)/10 
_ 60+T 
20T — T? 


We can now minimize this by using the calculus. Toward this end, let 


60+T 
aT) = x07 72 
then 
, (20T — T*) — (60 + T)(20 — 2T) 
g(T)= 


(20T — T?)? 
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Equating to 0 yields 

20T — T* = (60 + T)(20 — 2T) 
or, equivalently, 

T? + 120T — 1200 =0 
which yields the solutions 

T+9.25 and Tx -—129.25 


Since T < 10, it follows that the optimal policy for Mr. Brown would be to 
purchase a new car whenever his old car reaches the age of 9.25 years. a 


Example 7.13 (Dispatching a Train) Suppose that customers arrive at a train 
depot in accordance with a renewal process having a mean interarrival time j. 
Whenever there are N customers waiting in the depot, a train leaves. If the depot 
incurs a cost at the rate of nc dollars per unit time whenever there are 7 customers 
waiting, what is the average cost incurred by the depot? 

If we say that a cycle is completed whenever a train leaves, then the preceding 
is a renewal reward process. The expected length of a cycle is the expected time 
required for N customers to arrive and, since the mean interarrival time is jz, this 
equals 


Ef{length of cycle] = Nu 


If we let T,, denote the time between the nth and (7 + 1)st arrival in a cycle, then 
the expected cost of a cycle may be expressed as 


E[cost of a cycle] = E[c Tj + 2c T2 +--- + (N—1)cTy_1] 
which, since E[T,,] = 2, equals 
N 
—(N-1 
hs ( ) 
Hence, the average cost incurred by the depot is 


cuN(N—1) _ c(N-1) 
2Nu ~ 2 


Suppose now that each time a train leaves, the depot incurs a cost of six units. 
What value of N minimizes the depot’s long-run average cost when c = 2, = 1? 
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In this case, we have that the average cost per unit time N is 


6+ cuNN-D/2_ a, 14 6 
Nu N 


By treating this as a continuous function of N and using the calculus, we obtain 
that the minimal value of N is 


N= V6 %2.45 


Hence, the optimal integral value of N is either 2 which yields a value 4, or 3 
which also yields the value 4. Hence, either N = 2 or N = 3 minimizes the 
depot’s average cost. a 


Example 7.14 Suppose that customers arrive at a single-server system in accor- 
dance with a Poisson process with rate 4. Upon arriving a customer must pass 
through a door that leads to the server. However, each time someone passes 
through, the door becomes locked for the next ¢ units of time. An arrival finding 
a locked door is lost, and a cost c is incurred by the system. An arrival finding 
the door unlocked passes through to the server. If the server is free, the customer 
enters service; if the server is busy, the customer departs without service and a 
cost K is incurred. If the service time of a customer is exponential with rate p, 
find the average cost per unit time incurred by the system. 


Solution: The preceding can be considered to be a renewal reward process, 
with a new cycle beginning each time a customer arrives to find the door 
unlocked. This is so because whether or not the arrival finds the server free, 
the door will become locked for the next ¢ time units and the server will 
be busy for a time X that is exponentially distributed with rate yu. (If the 
server is free, X is the service time of the entering customer; if the server is 
busy, X is the remaining service time of the customer in service.) Since the 
next cycle will begin at the first arrival epoch after a time t has passed, it 
follows that 


E[time of a cycle] = t+ 1/A 


Let C; denote the cost incurred during a cycle due to arrivals finding the door 
locked. Then, since each arrival in the first ¢ time units of a cycle will result in 
a cost c, we have 


E[C,] = Atc 


Also, let Cz denote the cost incurred during a cycle due to an arrival finding 
the door unlocked but the server busy. Then because a cost K is incurred if 
the server is still busy a time ¢ after the cycle began and, in addition, the next 
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arrival after that time occurs before the service completion, we see that 


E[C2] = Ke’ 
[C2] e ear 


Consequently, 


Ate + 1Ke~#*/(A + p) 
t+1/a 


average cost per unit time = 


Example 7.15 Consider a manufacturing process that sequentially produces 
items, each of which is either defective or acceptable. The following type of sam- 
pling scheme is often employed in an attempt to detect and eliminate most of the 
defective items. Initially, each item is inspected and this continues until there are 
k consecutive items that are acceptable. At this point 100% inspection ends and 
each successive item is independently inspected with probability a. This partial 
inspection continues until a defective item is encountered, at which time 100% 
inspection is reinstituted, and the process begins anew. If each item is, indepen- 
dently, defective with probability q, 


(a) what proportion of items are inspected? 
(b) if defective items are removed when detected, what proportion of the remaining items 
are defective? 


Remark Before starting our analysis, note that the preceding inspection scheme 
was devised for situations in which the probability of producing a defective item 
changed over time. It was hoped that 100% inspection would correlate with 
times at which the defect probability was large and partial inspection when it 
was small. However, it is still important to see how such a scheme would work 
in the extreme case where the defect probability remains constant throughout. 


Solution: We begin our analysis by noting that we can treat the preceding as 
a renewal reward process with a new cycle starting each time 100% inspection 
is instituted. We then have 


E[{number inspected in a cycle] 


ti f it i ted = 
a aca aca cea ne E[{number produced in a cycle] 


Let N, denote the number of items inspected until there are k consecutive 
acceptable items. Once partial inspection begins—that is, after Nz items have 
been produced—since each inspected item will be defective with probability q, 
it follows that the expected number that will have to be inspected to find a 
defective item is 1/q. Hence, 


E[number inspected in a cycle] = E[Nz] + 


Qle 
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In addition, since at partial inspection each item produced will, independently, 
be inspected and found to be defective with probability ag, it follows that the 
number of items produced until one is inspected and found to be defective is 
1/aq, and so 


1 
E[number produced in a cycle] = E[N,] + — 


aq 


Also, as E[N,] is the expected number of trials needed to obtain k acceptable 
items in a row when each item is acceptable with probability p = 1—g, it follows 
from Example 3.14 that 


t,t 1 (1/p)k-1 
EIN) =-+— 35+-°+—=—— 
ep op pk q 
Hence, we obtain 
(1/p)* 
(1/p)k —1 + 1/a 


P; = proportion of items that are inspected = 


To answer (b), note first that since each item produced is defective with proba- 
bility g it follows that the proportion of items that are both inspected and found 
to be defective is gP}. Hence, for N large, out of the first N items produced 
there will be (approximately) NqP; that are discovered to be defective and 
thus removed. As the first N items will contain (approximately) Ng defective 
items, it follows that there will be Nq — NqP| defective items not discovered. 
Hence, 


Nq(1 — P}) 


roportion of the nonremoved items that are defective ~ —-————— 
Pee N(1— qPp) 


As the approximation becomes exact as N — oo, we see that 


proportion of the nonremoved items that are defective = as 

— qPy 
Example 7.16 (The Average Age of a Renewal Process) Consider a renewal pro- 
cess having interarrival distribution F and define A(t) to be the time at ¢ since 
the last renewal. If renewals represent old items failing and being replaced by 
new ones, then A(t) represents the age of the item in use at time ¢. Since Sn) 
represents the time of the last event prior to or at time t, we have 


A(t) =t — Sn) 
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We are interested in the average value of the age—that is, in 


ie Jo A dt 


$00 Ss 


To determine this quantity, we use renewal reward theory in the following way: 
Let us assume that at any time we are being paid money at a rate equal to the 
age of the renewal process at that time. That is, at time ¢, we are being paid at 
rate A(t), and so Io A(t) dt represents our total earnings by time s. As everything 
starts over again when a renewal occurs, it follows that 


Jo AW dt : E{reward during a renewal cycle] 


s E[time of a renewal cycle] 


Now, since the age of the renewal process a time f into a renewal cycle is just f, 
we have 


x 


reward during a renewal cycle = / t dt 
0 


where X is the time of the renewal cycle. Hence, we have that 


, A(t) dt 
average value of age = lim fp AW) 
s—>0o Ss 
ELX? 
ll (7.14) 
2ELX] 
where X is an interarrival time having distribution function F. a 


Example 7.17 (The Average Excess of a Renewal Process) Another quantity asso- 
ciated with a renewal process is Y(t), the excess or residual time at time t. Y(¢) 
is defined to equal the time from ¢ until the next renewal and, as such, represents 
the remaining (or residual) life of the item in use at time t. The average value of 
the excess, namely, 


ns Jo Y@ dt 


s—>0o Ss 


also can be easily obtained by renewal reward theory. To do so, suppose that 
we are paid at time ¢ at a rate equal to Y(t). Then our average reward per unit 
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time will, by renewal reward theory, be given by 


nc ty ¥Ohdt 
average value of excess = lim 
S00 Ss 


E[reward during a cycle] 


E{length of a cycle] 


Now, letting X denote the length of a renewal cycle, we have 


x 
reward during a cycle = i (X — t) dt 
0 


x2 


2 


and thus the average value of the excess is 


E[X?] 
2E[X] 


average value of excess = 


which was the same result obtained for the average value of the age of a renewal 
process. a 


7.5 Regenerative Processes 


Consider a stochastic process {X(t), t > 0} with state space 0,1,2,..., having 
the property that there exist time points at which the process (probabilistically) 
restarts itself. That is, suppose that with probability 1, there exists a time T1, 
such that the continuation of the process beyond T, is a probabilistic replica 
of the whole process starting at 0. Note that this property implies the existence 
of further times T2, T3,..., having the same property as T;. Such a stochastic 
process is known as a regenerative process. 

From the preceding, it follows that T;, T2,..., constitute the arrival times of a 
renewal process, and we shall say that a cycle is completed every time a renewal 
occurs. 


Examples 


(1) A renewal process is regenerative, and T, represents the time of the first renewal. 
(2) A recurrent Markov chain is regenerative, and T; represents the time of the first 
transition into the initial state. 


We are interested in determining the long-run proportion of time that a regen- 
erative process spends in state j. To obtain this quantity, let us imagine that we 
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earn a reward at a rate 1 per unit time when the process is in state j and at rate 
0 otherwise. That is, if I(s) represents the rate at which we earn at time s, then 


1, if X(s)=j 


a le if X() £ i 


and 


t 
total reward earned by t = i I(s) ds 
0 


As the preceding is clearly a renewal reward process that starts over again at the 
cycle time T,, we see from Proposition 7.3 that 


E[reward by time T;] 
E{T] 


average reward per unit time = 


However, the average reward per unit is just equal to the proportion of time that 
the process is in state j. That is, we have the following. 


Proposition 7.4 For a regenerative process, the long-run 


; bot et . [amount of time inj during a cycle] 
proportion of time in state j = 


E[time of a cycle] 


Remark If the cycle time T; is a continuous random variable, then it can be 
shown by using an advanced theorem called the “key renewal theorem” that the 
preceding is equal also to the limiting probability that the system is in state j at 
time ¢. That is, if T, is continuous, then 


E[amount of time in j during a cycle] 


lim P{X(t) =j} = 
too J E[time of a cycle] 

Example 7.18 Consider a positive recurrent continuous-time Markov chain that 
is initially in state i. By the Markovian property, each time the process reenters 
state i it starts over again. Thus returns to state i are renewals and constitute the 
beginnings of new cycles. By Proposition 7.4, it follows that the long-run 


E[amount of time in j during an i—/ cycle] 


Mii 


proportion of time in state j = 


where j;; represents the mean time to return to state i. If we take j to equal i, 
then we obtain 


1/0; 


Mii 


proportion of time in state i = 
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Example 7.19 (A Queueing System with Renewal Arrivals) Consider a waiting 
time system in which customers arrive in accordance with an arbitrary renewal 
process and are served one at time by a single server having an arbitrary service 
distribution. If we suppose that at time 0 the initial customer has just arrived, 
then {X(t), t > 0} is a regenerative process, where X(t) denotes the number of 
customers in the system at time ft. The process regenerates each time a customer 
arrives and finds the server free. a 


Example 7.20 Although a system needs only a single machine to function, it 
maintains an additional machine as a backup. A machine in use functions for a 
random time with density function f and then fails. If a machine fails while the 
other one is in working condition, then the latter is put in use and, simultaneously, 
repair begins on the one that just failed. If a machine fails while the other machine 
is in repair, then the newly failed machine waits until the repair is completed; at 
that time the repaired machine is put in use and, simultaneously, repair begins 
on the recently failed one. All repair times have density function g. Find Po, P1, 
P, where P; is the long-run proportion of time that exactly i of the machines are 
in working condition. 


Solution: Let us say that the system is in state i whenever i machines are in 
working condition i = 0,1, 2. It is then easy to see that every time the system 
enters state 1 it probabilistically starts over. That is, the system restarts every 
time that a machine is put in use while, simultaneously, repair begins on the 
other one. Say that a cycle begins each time the system enters state 1. If we 
let X denote the working time of the machine put in use at the beginning of a 
cycle, and let R be the repair time of the other machine, then the length of the 
cycle, call it T,, can be expressed as 


T, = max(X, R) 


The preceding follows when X < R, because, in this case, the machine in use 
fails before the other one has been repaired, and so a new cycle begins when 
that repair is completed. Similarly, it follows when R < X, because then the 
repair occurs first, and so a new cycle begins when the machine in use fails. 
Also, let Tj, i=0, 1,2, be the amount of time that the system is in state i during 
a cycle. Then, because the amount of time during a cycle that neither machine 
is working is R — X provided that this quantity is positive or 0 otherwise, 
we have 


To = (R— X)* 


Similarly, because the amount of time during the cycle that a single machine is 
working is min(X, R), we have 


T; = min(X, R) 
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Finally, because the amount of time during the cycle that both machines are 
working is X — R if this quantity is positive or 0 otherwise, we have 


Tz = (X— R)t 
Hence, we obtain 
_ BR =X] 
~ Efmax(X, R)] 
E[min(X, R)] 
E[max(X, R)] 


E[(X — R)*] 
E[max(X, R)] 


1= 
P= 


That Po + P, + P2 = 1 follows from the easily checked identity 
max(x, r) = min(x, r) + w@—nNt + (r—x)T 


The preceding expectations can be computed as follows: 
E{max(X, R)] = I fe max(x, r)f (x) g(r) dx dr 
= G rf (x) g(r) dvdr + [ [ xf (x) g(r) dx dr 
E((R—X)*]= ie iB (r—x)*f (x) g(r) dx dr 
= [= Fea ge dear 
Bmin(X, Ry} = f° f° mincs, 9 Fx) gir ddr 
ap. if xf (x) g(r) dvdr + [ i rf (x) g(r) dx dr 
E((X — R)t] = ic [eo — r) f(x) g(r) dr dx a 


7.5.1 Alternating Renewal Processes 


Another example of a regenerative process is provided by what is known as an 
alternating renewal process, which considers a system that can be in one of two 
states: on or off. Initially it is on, and it remains on for a time Zj; it then goes off 
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and remains off for a time Yj. It then goes on for a time Z2; then off for a time 
Y2; then on, and so on. 

We suppose that the random vectors (Z,, Y,), 1 > 1 are independent and 
identically distributed. That is, both the sequence of random variables {Z,} and 
the sequence {Y,,} are independent and identically distributed; but we allow Z,, 
and Y, to be dependent. In other words, each time the process goes on, everything 
starts over again, but when it then goes off, we allow the length of the off time 
to depend on the previous on time. 

Let E[Z] = E[Z,,] and E[Y] = E[Y,] denote, respectively, the mean lengths of 
an on and off period. 

We are concerned with Pon, the long-run proportion of time that the system is 
on. If we let 


SY ee aT 


then at time X, the process starts over again. That is, the process starts over 
again after a complete cycle consisting of an on and an off interval. In other 
words, a renewal occurs whenever a cycle is completed. Therefore, we obtain 
from Proposition 7.4 that 


E[Z] 
E[Y] + E[Z] 
E[on] 


= Elonl + Floffl ae 


Pon = 


Also, if we let Pog denote the long-run proportion of time that the system is off, 
then 


Pope = 1 — Pon 


_ E[off] 
~ Efon] + E[off] ae 


Example 7.21 (A Production Process) One example of an alternating renewal 
process is a production process (or a machine) that works for a time Z 1, then 
breaks down and has to be repaired (which takes a time Y,), then works for a 
time Z2, then is down for a time Y2, and so on. If we suppose that the process 
is as good as new after each repair, then this constitutes an alternating renewal 
process. It is worthwhile to note that in this example it makes sense to suppose 
that the repair time will depend on the amount of time the process had been 
working before breaking down. a 


Example 7.22 The rate a certain insurance company charges its policyholders 
alternates between r; and ro. A new policyholder is initially charged at a rate of 
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r, per unit time. When a policyholder paying at rate r; has made no claims for 
the most recent s time units, then the rate charged becomes rg per unit time. The 
rate charged remains at 79 until a claim is made, at which time it reverts to 71. 
Suppose that a given policyholder lives forever and makes claims at times chosen 
according to a Poisson process with rate A, and find 

(a) Pj, the proportion of time that the policyholder pays at rate rj, i = 0, 1; 

(b) the long-run average amount paid per unit time. 


Solution: If we say that the system is “on” when the policyholder pays at rate 
r, and “off” when she pays at rate ro, then this on-off system is an alternating 
renewal process with a new cycle starting each time a claim is made. If X is 
the time between successive claims, then the on time in the cycle is the smaller 
of s and X. (Note that if X < s, then the off time in the cycle is 0.) Since X is 
exponential with rate 4, the preceding yields 


E[on time in cycle] = E[min(X, s)] 


Ss 
= i xare** dx + se~*s 
0 


1 
ce gy 


Since E[X] = 1/A, we see that 


ae E[on time in cycle] ah ge 
ELX] 


and 
Po =1-P, =e" 
The long-run average amount paid per unit time is 


roP9 +m Pi =m -(1 - roe *s |_| 


Example 7.23 (The Age of a Renewal Process) Suppose we are interested in 
determining the proportion of time that the age of a renewal process is less than 
some constant c. To do so, let a cycle correspond to a renewal, and say that the 
system is “on” at time ¢ if the age at ¢ is less than or equal to c, and say it is 
“off” if the age at f is greater than c. In other words, the system is “on” the first 
c time units of a renewal interval, and “off” the remaining time. Hence, letting 
X denote a renewal interval, we have, from Equation (7.15), 


E[min(X, c)] 
E[X] 
fo” Pimin(X, c) > x} dx 
7 E[X] 


proportion of time age is less than c = 
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7 fg P{X > x} dx 
7 E[X] 


“(1-F d. 
Joba Tayas (7.17) 
E[X] 
where F is the distribution function of X and where we have used the identity 
that for a nonnegative random variable Y 


BYi= [ P{Y > x}dx a 
0 


Example 7.24 (The Excess of a Renewal Process) Let us now consider the long- 
run proportion of time that the excess of a renewal process is less than c. To 
determine this quantity, let a cycle correspond to a renewal interval and say that 
the system is on whenever the excess of the renewal process is greater than or 
equal to c and that it is off otherwise. In other words, whenever a renewal occurs 
the process goes on and stays on until the last c time units of the renewal interval 
when it goes off. Clearly this is an alternating renewal process, and so we obtain 
Equation (7.16) that 


E[off time in cycle] 


long-run proportion of time the excess is less than c = - 
8 a E[cycle time] 


If X is the length of a renewal interval, then since the system is off the last 
c time units of this interval, it follows that the off time in the cycle will equal 
min(X, c). Thus, 


. . ’ E[min(X, c)] 

long-run proportion of time the excess is less than ¢ = ——————— 
E[X] 

= Jo  — F(x)) dx 

7 E[X] 


where the final equality follows from Equation (7.17). Thus, we see from the 
result of Example 7.23 that the long-run proportion of time that the excess is 
less than c and the long-run proportion of time that the age is less than c are 
equal. One way to understand this equivalence is to consider a renewal process 
that has been in operation for a long time and then observe it going backwards 
in time. In doing so, we observe a counting process where the times between 
successive events are independent random variables having distribution F. That 
is, when we observe a renewal process going backwards in time we again observe 
a renewal process having the same probability structure as the original. Since the 
excess (age) at any time for the backwards process corresponds to the age (excess) 
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-Y(t)- 


t A 
first renewal after t 


As 
t 


« * x « 


last renewal before t 


Figure 7.3. Arrowheads indicate direction of time. 


at that time for the original renewal process (see Figure 7.3), it follows that all 
long-run properties of the age and the excess must be equal. i 


Example 7.25 (The Busy Period of the M/G/oo Queue) The infinite server queue- 
ing system in which customers arrive according to a Poisson process with rate 
A, and have a general service distribution G, was analyzed in Section 5.3, where 
it was shown that the number of customers in the system at time ¢ is Poisson 
distributed with mean A te G(y)dy. If we say that the system is busy when there 
is at least one customer in the system and is idle when the system is empty, find 
E(B], the expected length of a busy period. 


Solution: If we say that the system is on when there is at least one customer 
in the system, and off when the system is empty, then we have an alternating 
renewal process. Because ie G(t)dt = E[S], where E[S] is the mean of the 
service distribution G, it follows from the result of Section 5.3 that 


lim P{system off at t} = e+ EIS] 
t>oo 


Consequently, from alternating renewal process theory we obtain 


—AE[S] _ E[off time in cycle] 


E[cycle time] 


But when the system goes off, it remains off only up to the time of the next 
arrival, giving that 


E[off time in cycle] = 1/2 
Because 
E[on time in cycle] = E[B] 


we obtain 


AFIS] _ 1/r 


2 ~ 1/0 + E[B] 
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or 
E[B] = ~ (cE = 1) | 


If is the mean interarrival time, then the distribution function F,, defined by 


F.(x) = [pao 
(0) LL 


is called the equilibrium distribution of F. From the preceding, it follows that 
F.(x) represents the long-run proportion of time that the age, and the excess, of 
the renewal process is less than or equal to x. 


Example 7.26 (An Inventory Example) Suppose that customers arrive at a speci- 
fied store in accordance with a renewal process having interarrival distribution F. 
Suppose that the store stocks a single type of item and that each arriving customer 
desires a random amount of this commodity, with the amounts desired by the 
different customers being independent random variables having the common dis- 
tribution G. The store uses the following (s,S) ordering policy: If its inventory 
level falls below s then it orders enough to bring its inventory up to S. That is, if 
the inventory after serving a customer is x, then the amount ordered is 


S—x, ifx<s 
0, ifx>s 


The order is assumed to be instantaneously filled. 

For a fixed value y, s < y < S, suppose that we are interested in determining 
the long-run proportion of time that the inventory on hand is at least as large as y. 
To determine this quantity, let us say that the system is “on” whenever the inven- 
tory level is at least y and is “off” otherwise. With these definitions, the system 
will go on each time that a customer’s demand causes the store to place an order 
that results in its inventory level returning to S. Since whenever this occurs a 
customer must have just arrived it follows that the times until succeeding cus- 
tomers arrive will constitute a renewal process with interarrival distribution F; 
that is, the process will start over each time the system goes back on. Thus, the 
on and off periods so defined constitute an alternating renewal process, and from 
Equation (7.15) we have that 


: ea E[on time in a cycle] 
long- t f t tory >y= 7.18 
ong-run proportion of time inventory > y Fiodeunel ( ) 


Now, if we let D;, Dz,... denote the successive customer demands, and let 


Ny, = min(z: Dy +---+ Dy, > S—x) (7.19) 
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then it is the Ny customer in the cycle that causes the inventory level to fall below 
y, and it is the N; customer that ends the cycle. As a result, if we let X;, i > 1, 
denote the interarrival times of customers, then 


Ny 

on time in a cycle = S°X; (7.20) 
i=1 
Ns 

cycle time = ae.¢ (7.21) 


i=1 


Assuming that the interarrival times are independent of the successive demands, 
we have that 


Ny Ny 
Aly x,] = [E] > XiINs] 
i=1 i=1 
= E[NyE[X]] 
= E[X]EIN,] 


Similarly, 
Ns 
apy x,] = E[X]EINg] 
i=1 
Therefore, from Equations (7.18), (7.20), and (7.21) we see that 


long-run proportion of time inventory > y 


(7.22) 


However, as the Dj, i > 1, are independent and identically distributed nonnega- 
tive random variables with distribution G, it follows from Equation (7.19) that 
N, has the same distribution as the index of the first event to occur after time S—x 
of a renewal process having interarrival distribution G. That is, N,, — 1 would be 
the number of renewals by time S — x of this process. Hence, we see that 

E[Ny] = m(S—y) + 1, 

E[N.] = m(S—s) +1 


where 


m(t) = >> Grlt) 


n= 
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From Equation (7.22), we arrive at 


m(S—y)+1 
m(S—s) +1? 


long-run proportion of time inventory > y s<y<S 


For instance, if the customer demands are exponentially distributed with mean 
1/y, then 


w(S—y)+1 
w(S—s)+1? 


long-run proportion of time inventory > y s<y<S 


7.6 Semi-Markov Processes 


Consider a process that can be in state 1 or state 2 or state 3. It is initially in 
state 1 where it remains for a random amount of time having mean 11, then it 
goes to state 2 where it remains for a random amount of time having mean pu2, 
then it goes to state 3 where it remains for a mean time jz3, then back to state 1, 
and so on. What proportion of time is the process in state i, i= 1, 2, 3? 

If we say that a cycle is completed each time the process returns to state 1, and 
if we let the reward be the amount of time we spend in state i during that cycle, 
then the preceding is a renewal reward process. Hence, from Proposition 7.3 we 
obtain that P;, the proportion of time that the process is in state i, is given by 


Bi 


P; = ———+__ 
M1 + M2 + M3 


ae Sha ae 


Similarly, if we had a process that could be in any of N states 1, 2,..., N and 
that moved from state 1 > 2 > 3 > --- > N—1— N > 1, then the long-run 
proportion of time that the process spends in state i is 


P= Mi 
wa + a2 bo + aN 


i PT Os De 


where ju; is the expected amount of time the process spends in state i during each 
visit. 

Let us now generalize the preceding to the following situation. Suppose that a 
process can be in any one of N states 1, 2,..., N, and that each time it enters 
state i it remains there for a random amount of time having mean jz; and then 
makes a transition into state j with probability P;;. Such a process is called a 
semi-Markov process. Note that if the amount of time that the process spends 
in each state before making a transition is identically 1, then the semi-Markov 
process is just a Markov chain. 
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Let us calculate P; for a semi-Markov process. To do so, we first consider 
mj, the proportion of transitions that take the process into state i. Now, if we 
let X,, denote the state of the process after the mth transition, then {X,,” > 0} 
is a Markov chain with transition probabilities Pj, 1,7 = 1, 2,..., N. Hence, 
x; will just be the limiting (or stationary) probabilities for this Markov chain 
(Section 4.4). That is, 2; will be the unique nonnegative solution* of 


N 
Yomi = 1, 
i=1 
N 
t= eP. Da AD. aN (7.23) 
j=l 


Now, since the process spends an expected time ju; in state i whenever it visits 
that state, it seems intuitive that P; should be a weighted average of the 2; where 
zr, is weighted proportionately to ;. That is, 
Tj Lj . 
Pp= i = 1,2,...,N (7.24) 
Dah Tj Mj 


where the z; are given as the solution to Equation (7.23). 


Example 7.27 Consider a machine that can be in one of three states: good condi- 
tion, fair condition, or broken down. Suppose that a machine in good condition 
will remain this way for a mean time j1 and then will go to either the fair con- 
dition or the broken condition with respective probabilities 3 and : A machine 
in fair condition will remain that way for a mean time jz and then will break 
down. A broken machine will be repaired, which takes a mean time j3, and 
when repaired will be in good condition with probability 2 and fair condition 


with probability 7 What proportion of time is the machine in each state? 


Solution: Letting the states be 1, 2, 3, we have by Equation (7.23) that the z; 
satisfy 


m+72+73=1, 


2 
1 = ~73, 
1 3 3 
1 
Tg TA 3s 
: + 
Ta gah ne 


* We shall assume that there exists a solution of Equation (7.23). That is, we assume that all of the 
states in the Markov chain communicate. 
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The solution is 


Hence, from Equation (7.24) we obtain that P;, the proportion of time the 
machine is in state 7, is given by 


= 44 
4uy + Sur + 63’ 


_ Sz 
Aur + Sur + 63’ 


Po 


P= 613 
3 — 
41 + Sur + 63 


For instance, if 41 = 5, “2 = 2, «43 = 1, then the machine will be in good 


condition 3 of the time, in fair condition * of the time, in broken condition 


é of the time. | 


Remark When the distributions of the amount of time spent in each state during 
a visit are continuous, then P; also represents the limiting (as t > 00) probability 
that the process will be in state i at time f. 


Example 7.28 Consider a renewal process in which the interarrival distribution 
is discrete and is such that 


P(X=iJ=p;, i>1 


where X represents an interarrival random variable. Let L(t) denote the length 
of the renewal interval that contains the point ¢ (that is, if N(t) is the number 
of renewals by time ¢ and X,, the nth interarrival time, then L(t) = Xn41). If 
we think of each renewal as corresponding to the failure of a lightbulb (which is 
then replaced at the beginning of the next period by a new bulb), then L(¢) will 
equal i if the bulb in use at time ¢ dies in its ith period of use. 

It is easy to see that L(£) is a semi-Markov process. To determine the proportion 
of time that L(t) = j, note that each time a transition occurs—that is, each time a 
renewal occurs—the next state will be j with probability p;. That is, the transition 
probabilities of the embedded Markov chain are Pj; = pj. Hence, the limiting 
probabilities of this embedded chain are given by 


1 = Pj 
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and, since the mean time the semi-Markov process spends in state j before a 
transition occurs is j, it follows that the long-run proportion of time the state 
is j 1s 


ID; 
ae Yi Pi 


7.7 The Inspection Paradox 


Suppose that a piece of equipment, say, a battery, is installed and serves until it 
breaks down. Upon failure it is instantly replaced by a like battery, and this pro- 
cess continues without interruption. Letting N(t) denote the number of batteries 
that have failed by time t, we have that {N(¢), t > 0} is a renewal process. 

Suppose further that the distribution F of the lifetime of a battery is not known 
and is to be estimated by the following sampling inspection scheme. We fix some 
time ¢ and observe the total lifetime of the battery that is in use at time f. Since F is 
the distribution of the lifetime for all batteries, it seems reasonable that it should 
be the distribution for this battery. However, this is the inspection paradox for 
it turns out that the battery in use at time t tends to have a larger lifetime than 
an ordinary battery. 

To understand the preceding so-called paradox, we reason as follows. In 
renewal theoretic terms what we are interested in is the length of the renewal inter- 
val containing the point f. That is, we are interested in Xn(41 = SN@y41 — SN) 
(see Figure 7.2). To calculate the distribution of Xn +)+1 we condition on the time 
of the last renewal prior to (or at) time t. That is, 


P{Xnwy41 > x} = E[P{Xnwy41 > xXISnq@ = t — s}] 


where we recall (Figure 7.2) that Sn) is the time of the last renewal prior to (or 
at) ¢. Since there are no renewals between ¢ — s and t, it follows that XN (41 must 
be larger than x if s > x. That is, 


P{Xnwi1 > xXISnq =t-s}=1 ifs>x (7.25) 


On the other hand, suppose that s < x. As before, we know that a renewal 
occurred at time tf — s and no additional renewals occurred between t — s and 
t, and we ask for the probability that no renewals occur for an additional time 
x — s. That is, we are asking for the probability that an interarrival time will be 
greater than x given that it is greater than s. Therefore, for s < x, 


P{Xn@+1 > XISNa) = t — 5} 
= P{interarrival time > x|interarrival time > s} 


= P{interarrival time > x}/P{interarrival time > s} 
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_ 1-F(x) 
~ 1 — F(s) 
>1— F(x) (7.26) 


Hence, from Equations (7.25) and (7.26) we see that, for all s, 
P{Xnq~41 > x|Snay =t —s} > 1-— F(x) 
Taking expectations on both sides yields 
P{Xnwy41 > x} S 1 — FX) (7.27) 


However, 1 — F(x) is the probability that an ordinary renewal interval is larger 
than x, that is, 1 — F(x) = P{X, > x}, and thus Equation (7.27) is a statement 
of the inspection paradox that the length of the renewal interval containing the 
point ¢ tends to be larger than an ordinary renewal interval. 


Remark To obtain an intuitive feel for the so-called inspection paradox, reason 
as follows. We think of the whole line being covered by renewal intervals, one of 
which covers the point t. Is it not more likely that a larger interval, as opposed 
to a shorter interval, covers the point t? 


We can explicitly calculate the distribution of Xj (z)41 when the renewal process 
is a Poisson process. (Note that, in the general case, we did not need to calculate 
explicitly P{XN@+41 > x} to show that it was at least as large as 1 — F(x).) To do 
sO we write 


XN()+1 = A(t) + Y(t) 


where A(t) denotes the time from ¢ since the last renewal, and Y(t) denotes the 
time from ¢ until the next renewal (see Figure 7.4). A(£) is the age of the process 
at time ¢ (in our example it would be the age at time ¢ of the battery in use at 
time t), and Y(t) is the excess life of the process at time t (it is the additional 
time from ¢ until the battery fails). Of course, it is true that A(t) = t — Sn, and 
Y(t) = Snap — t. 

To calculate the distribution of XNi+1 we first note the important fact that, 
for a Poisson process, A(t) and Y(t) are independent. This follows since by the 
memoryless property of the Poisson process, the time from ¢ until the next renewal 
will be exponentially distributed and will be independent of all that has previously 


« A(t) >< Y(t) > 
x 
t Time 


Figure 7.4 
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occurred (including, in particular, A(f)). In fact, this shows that if {N(¢), t > 0} 
is a Poisson process with rate A, then 


P{LY@ <x}=1-e" (7.28) 
The distribution of A(t) may be obtained as follows 


P{0 renewals in [t—x, t]}, ifx<t 


TBO) 2 ( ifx>t 


e?*, ifx<t 
0, ifx>t 


or, equivalently, 


1-—e°%*, x<t 


P{A@) <x} = | (7.29) 


1, x>t 


Hence, by the independence of Y(t) and A(¢) the distribution of XN(+1 is just 
the convolution of the exponential distribution seen in Equation (7.28) and the 
distribution of Equation (7.29). It is interesting to note that for t large, A(t) 
approximately has an exponential distribution. Thus, for ¢ large, Xni41 has 
the distribution of the convolution of two identically distributed exponential 
random variables, which by Section 5.2.3 is the gamma distribution with param- 
eters (2,4). In particular, for t large, the expected length of the renewal interval 
containing the point ¢ is approximately twice the expected length of an ordinary 
renewal interval. 

Using the results obtained in Examples 7.16 and 7.17 concerning the average 
values of the age and of the excess, it follows from the identity 


XnN(j+1 = A(t) + Y(t) 


that the average length of the renewal interval containing a specified point is 


i ih Xnip41dt — E[X?] 
im => 
s—>00 s E[X] 


where X has the interarrival distribution. Because, except for when X is a con- 
stant, E[X*] > (E[X])?, this average value is, as expected from the inspection 
paradox, greater than the expected value of an ordinary renewal interval. 

We can use an alternating renewal process argument to determine the long-run 
proportion of time that Xy7)+1 is greater than c. To do so, let a cycle correspond 
to a renewal interval, and say that the system is on at time ¢ if the renewal interval 
containing ¢ is of length greater than c (that is, if XN@+1 > ©), and say that the 
system is off at time t otherwise. In other words, the system is always on during a 


7.8 Computing the Renewal Function 463 


cycle if the cycle time exceeds c or is always off during the cycle if the cycle time 
is less than c. Thus, if X is the cycle time, we have 


X, ifX>c 


on time in cycle = . 
0, ifX<c 


Therefore, we obtain from alternating renewal process theory that 


E[on time in cycle] 


long-run proportion of time X >Cc= : 
6 Prop I E[cycle time] 


= [O° ccf (x) dx 
= lL 


where f is the density function of an interarrival. 


7.8 Computing the Renewal Function 


The difficulty with attempting to use the identity 


m(t) = 2 Fu) 


n=1 


to compute the renewal function is that the determination of F,,(t) = P{X1 + 
-++ + X, < t} requires the computation of an n-dimensional integral. Following, 
we present an effective algorithm that requires as inputs only one-dimensional 
integrals. 

Let Y be an exponential random variable having rate A, and suppose that Y 
is independent of the renewal process {N(t),¢ > 0}. We start by determining 
E[N(Y)], the expected number of renewals by the random time Y. To do so, we 
first condition on Xj, the time of the first renewal. This yields 


EIN(Y)] = i EIN(Y)|X1 = xIf (x) dx (7.30) 


where f is the interarrival density. To determine E[N(Y)|X 1 =x], we now con- 
dition on whether or not Y exceeds x. Now, if Y < x, then as the first renewal 
occurs at time x, it follows that the number of renewals by time Y is equal to 0. 
On the other hand, if we are given that x < Y, then the number of renewals by 
time Y will equal 1 (the one at x) plus the number of additional renewals between 
x and Y. But by the memoryless property of exponential random variables, it fol- 
lows that, given that Y > x, the amount by which it exceeds x is also exponential 
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with rate A, and so given that Y > x the number of renewals between x and Y 
will have the same distribution as N(Y). Hence, 

E[N(Y)|X1 =x, Y <x] =0, 

ELN(Y)|X1 =x, Y > x] = 1+ E[N(Y)] 


and so, 


ELN(Y)|X1 = x] = ELN(Y)|X1 =x, Y < x]P{Y < x|X1, =x} 
+ E[N(Y)|X1 =x, Y > x]P{Y > x|X1 = x} 
= E[N(Y)|X1 =x, Y > x]P{Y > x} 
since Y and X, are independent 


= (1 + E[N(Y)])e~** 


Substituting this into Equation (7.30) gives 
BIN) = (1+ EINCYYD fe FC) dx 
0 


or 


Ee] 


E[N(Y)] = 1 Ele*X] 


(7.31) 


where X has the renewal interarrival distribution. 

If we let A = 1/t, then Equation (7.31) presents an expression for the expected 
number of renewals (not by time t, but) by a random exponentially distributed 
time with mean t. However, as such a random variable need not be close to its 
mean (its variance is ¢7), Equation (7.31) need not be particularly close to m(t). To 
obtain an accurate approximation suppose that Y1, Y2,..., Y, are independent 
exponentials with rate A and suppose they are also independent of the renewal 
process. Let, forr=1,...,7, 


m, = E[N(Y1 +---+ Y,)] 


To compute an expression for m,, we again start by conditioning on X1, the time 
of the first renewal: 


me i BING set YoIXt =x] fO0 dx (7.32) 
0 


To determine the foregoing conditional expectation, we now condition on the 
number of partial sums oan Y;,j =1,...,7, that are less than x. Now, if all r 
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partial sums are less than x—that is, if }°’_, Y; < x—then clearly the number of 
renewals by time }7/_, Y; is 0. On the other hand, given that k,k <r, of these 
partial sums are less than x, it follows from the lack of memory property of the 
exponential that the number of renewals by time }7j_, Y; will have the same 
distribution as 1 plus N(Y,,; + --- + Y;). Hence, 


J 
ENO +.---+ Y,)|X, =x, k of the sums > Y; are less than | 
i=1 


(7.33) 


_ {9, Lest 
~ |1l+m_p, ifk<r 


To determine the distribution of the number of the partial sums that are less 


than x, note that the successive values of these partial sums )”._, Yj, = 1,...,7, 
have the same distribution as the first r event times of a Poisson process with 
rate A (since each successive partial sum is the previous sum plus an independent 
exponential with rate 4). Hence, it follows that, for k < 1, 


j 
P {k of the partial sums S Y; are less than x|X = | 
i=1 


e*(Ax)k 
= (7.34) 


Upon substitution of Equations (7.33) and (7.34) into Equation (7.32), we obtain 
oo fol 
my = | ya i) a thie 
0 


or, equivalently, 


Del + mad E[XRe YOK /AY + Ele] 
My; = 
1-E [eA] 


(7.35) 


If we set A = n/t, then starting with mj given by Equation (7.31), we can 
use Equation (7.35) to recursively compute 72,..., 77. The approximation of 
m(t) = E[N(#)] is given by m, = E[N(Y; + --- + Y,)]. Since Y; +--+ + Yj, is 
the sum of m independent exponential random variables each with mean t/n, it 
follows that it is (gamma) distributed with mean ¢ and variance nt?/n* = t*/n. 
Hence, by choosing 7 large, )~"_, Y; will be a random variable having most of its 
probability concentrated about t, and so E[N( 77, Yi)] should be quite close 
to E[N(t)]. (Indeed, if 77(t) is continuous at f, it can be shown that these approx- 
imations converge to m(t) as 1 goes to infinity.) 
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Table 7.1 Approximating m(t) 


F; Exact Approximation 

i t m(t) n=1 n=3 n= 10 n=25 n= 50 
1 1 0.2838 0.3333 0.3040 0.2903 0.2865 0.2852 
1 2 0.7546 0.8000 0.7697 0.7586 0.7561 0.7553 
1 5 2.250 2.273 2.253 2.250 2.250 2.250 
1 10 4.75 4.762 4.751 4.750 4.750 4.750 
2 0.1 0.1733 0.1681 0.1687 0.1689 0.1690 — 

2 0.3 0.5111 0.4964 0.4997 0.5010 0.5014 — 

2 0.5 0.8404 0.8182 0.8245 0.8273 0.8281 0.8283 
2 1 1.6400 1.6087 1.6205 1.6261 1.6277 1.6283 
2 3 4.7389 4.7143 4.7294 4.7350 4.7363 4.7367 
2 10 15.5089 15.5000 15.5081 15.5089 15.5089 15.5089 
3 0.1 0.2819 0.2692 0.2772 0.2804 0.2813 — 

3 0.3 0.7638 0.7105 0.7421 0.7567 0.7609 — 

3 1 2.0890 2.0000 2.0556 2.0789 2.0850 2.0870 
3 3 5.4444 5.4000 5.4375 5.4437 5.4442 5.4443 


Example 7.29 Table 7.1 compares the approximation with the exact value for 
the distributions F; with densities f;, i= 1, 2, 3, which are given by 


fi (x) = ae 


1 — Fy(x) = 0.3e-* + 0.7¢72*, 
1 — F3(x) = 0.5e~* + 0.5e7>* r 
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A counting process with independent interarrival times X 1, X2,... is said to be 
a delayed or general renewal process if X; has a different distribution from the 
identically distributed random variables X2, X3,.... That is, a delayed renewal 
process is a renewal process in which the first interarrival time has a different 
distribution than the others. Delayed renewal processes often arise in practice 
and it is important to note that all of the limiting theorems about N(t), the 
number of events by time t, remain valid. For instance, it remains true that 


E[N(t)] ae San Var(N(t)) = 


1 
— o7 /w ast > oO 
t iu t 


where yz and o? are the expected value and variance of the interarrivals X;,i > 1. 
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7.9.1 Patterns of Discrete Random Variables 


Let X1, X2,... be independent with P{X; = j} = p(/), j > 0, and let T denote the 
first time the pattern x1,...,x, occurs. If we say that a renewal occurs at time 
nin > 4, if (Xn—peis---,Xn) = (X1,---,X7), then N(m), n > 1, is a delayed 
renewal process, where N(z) denotes the number of renewals by time n. It 
follows that 


FINO eS eae haa (7.36) 
n lb 
2; 

ver) ‘ - as n > 00 (7.37) 
a 


where yz and o are, respectively, the mean and standard deviation of the time 
between successive renewals. Whereas, in Section 3.6.4, we showed how to com- 
pute the expected value of T, we will now show how to use renewal theory results 
to compute both its mean and its variance. 

To begin, let (i) equal 1 if there is a renewal at time i and let it be 0 otherwise, 
i >r. Also, let p = []j_, p(xi). Since, 


P(IG) = 1} = P(X = H,..., Xi =H} =p 
it follows that I(i), i > r, are Bernoulli random variables with parameter p. Now, 
n 
N@) = Yo 1) 
so 


E[N(n)] = Dato = (n—r+1)p 


Dividing by 7 and then letting 7 > oo gives, from Equation (7.36), 
w= 1/p (7.38) 


That is, the mean time between successive occurrences of the pattern is equal to 
1/p. Also, 


Var(N()) _ pha wags 
eee Van) Ts Cov(I(a), 1Q)) 


—r+1 pies 
=" 1 pd-p+=S° YO Cov, 1) 


i=r i<j<min(i+r—1,7) 


468 Renewal Theory and Its Applications 


where the final equality used the fact that I(i) and I(j) are independent, and thus 
have zero covariance, when |i — j| > r. Letting m > oo, and using the fact that 
Cov(I(i), I(j)) depends on i and j only through | — i|, gives 


r-1 
ver) > pi—p)+2 ye Cov(I(r), I(r + 7)) 


Therefore, using Equations (7.37) and (7.38), we see that 


r-1 
o =p 7(1—p)+ 2p > CoviI(r), I(r + j) (7.39) 
j=1 


Let us now consider the amount of “overlap” in the pattern. The overlap, equal 
to the number of values at the end of one pattern that can be used as the beginning 
part of the next pattern, is said to be of size k, k > 0, if 


k= max{j <7: (t—j-415 waieg iy) = (14, ees »4)} 


and is of size 0 if for allk =1,..., r—1, G_g44,-.- i) H (it,.--, ig). Thus, for 
instance, the pattern 0, 0, 1, 1 has overlap 0, whereas 0, 0, 1, 0, 0 has overlap 2. 
We consider two cases. 


Case 1: The Pattern Has Overlap 0 

In this case, N(7), m > 1, is an ordinary renewal process and T is distributed as 
an interarrival time with mean p and variance o*. Hence, we have the following 
from Equation (7.38): 


E[T] == (7.40) 


1 
Pp 
Also, since two patterns cannot occur within a distance less than r of each other, 
it follows that I(r)I(r + 7) = 0 when 1 <j < r—1. Hence, 

Cov(I(7), I(r + j)) = -EU@MIEU + pl=-p, if1<j<r-1 
Hence, from Equation (7.39) we obtain 

Var(T) = 0? =p 7(1—p)-2p3(r- Dp? =p? -Q2r—-1Wp"! (7.41) 


Remark In cases of “rare” patterns, if the pattern hasn’t yet occurred by some 
Pp ’ Pp y' ry. 
time n, then it would seem that we would have no reason to believe that the 
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remaining time would be much less than if we were just beginning from scratch. 
That is, it would seem that the distribution is approximately memoryless and 
would thus be approximately exponentially distributed. Thus, since the variance 
of an exponential is equal to its mean squared, we would expect when uw is large 
that Var(T) ~ E?[T], and this is borne out by the preceding, which states that 
Var(T) = E*[T] — (2r — 1)E[T]. 


Example 7.30 Suppose we are interested in the number of times that a fair coin 
needs to be flipped before the pattern h,h,t,h,t occurs. For this pattern, r = 5, 
1 


p = 33, and the overlap is 0. Hence, from Equations (7.40) and (7.41) 


E[T] = 32, Var(T) = 32* —9 x 32 = 736, 
and 
Var(T)/E?[T] = 0.71875 


On the other hand, if p(@) = i/10, i = 1, 2, 3, 4 and the pattern is 1, 2, 1, 
4, 1, 3, 2 then r = 7, p = 3/625,000, and the overlap is 0. Thus, again from 
Equations (7.40) and (7.41), we see that in this case 


E[T] = 208,333.33, Var(T) = 4.34 x 10!°, 
Var(T)/E°[T] = 0.99994 | 


Case 2: The Overlap Is of Size k 
In this case, 


T =Tiy,.ig + T* 


Lovesdp 
where T;j,,;, is the time until the pattern i;,...,i, appears and T%*, distributed 
as an interarrival time of the renewal process, is the additional time that it takes, 
starting with i1,...,i,, to obtain the pattern i,,...,i,. Because these random 
variables are independent, we have 


E(T] = E[Ti,,...,4,] + E[T"] (7.42) 


Ly-sbk 


Var(T) = Var(Tj,,..,i,) + Var(T*) (7.43) 


Lovers 
Now, from Equation (7.38) 


E[T*]=yn=p! (7.44) 
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Also, since no two renewals can occur within a distance r— k — 1 of each other, it 
follows that I(r)I(r + 7) = Oif 1 <j < r—k-—1. Therefore, from Equation (7.39) 
we see that 


r—-1 
Val") =0° Sp "0 =p) 4 2p ( > EUG +)l-@- ve 


j=r—k 
r-1 
=p? —(2r-1)p'+2p? SO EMI + jl (7.45) 
j=r—k 


The quantities E[I(r)I(r + j)] in Equation (7.45) can be calculated by considering 


the particular pattern. To complete the calculation of the first two moments of T, 


we then compute the mean and variance of T;,,._,;, by repeating the same method. 


Example 7.31 Suppose that we want to determine the number of flips of a fair 
coin until the pattern h,h,t,h,h occurs. For this pattern, r= 5, p = > and the 
overlap parameter is k = 2. Because 

E{I(S)1(8)] = P{h,h,t,h,h,t,h,h} = x36 

E[I(5)1(9)] = Pfh,h,t,h,h,h,t,h,h} = ay 
we see from Equations (7.44) and (7.45) that 

AP = 32. 

Var(T*) = (32)? — 9(32) + 2(32)?(s4¢ + sp) = 1120 
Hence, from Equations (7.42) and (7.43) we obtain 

E(T] = E[Thal +32, Var(T) = Var(Th,n) + 1120 


Now, consider the pattern h,h. It has r = 2, p = i, and overlap parameter 1. 
Since, for this pattern, E[I(2)I(3)] = 3 we obtain, as in the preceding, that 


E( Thal = E[Th] + 4, 


Var(Th,n) = Var(Tn) + 16 — 3(4) + 2(&) = Var(Th) + 20 


Finally, for the pattern A, which has r = 1,p = i we see from Equations (7.40) 
and (7.41) that 


E[T;,]=2, Var(T,) = 2 
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Putting it all together gives 
E[T] =38, Var(T) =1142, Var(T)/E2[T] = 0.79086 a 


Example 7.32 Suppose that P{X,, = i} = p;,and consider the pattern 0, 1, 2, 0, 1, 
3, 0, 1. Then p = pep} p2p3, r = 8, and the overlap parameter is k = 2. Since 


EL(8)1(14)] = pop 103035 
E[{I(8)I(15)] = 0 

we see from Equations (7.42) and (7.44) that 
E{T] = E[To,l +p! 


and from Equations (7.43) and (7.45) that 


Var(T) = Var(To,1) + p * — 15p7! + 2p"! opi) + 


Now, the r and p values of the pattern 0, 1 are r(0, 1) = 2, p(0,1) = popi, and 
this pattern has overlap 0. Hence, from Equations (7.40) and (7.41), 


E[To,11 = (op1)~', Var(To,1) = (pop1)~* — 3(Pop1) 
For instance, if pj = 0.2, 1 = 0,1, 2,3 then 


E[T] = 25 + 5° = 390,650 
Var(T) = 625 — 75 + 51° 4 35 x 58 = 1.526 x 10!! 
Var(T)/E7[T] = 0.99996 | 


Remark It can be shown that T is a type of discrete random variable called 
new better than used (NBU), which loosely means that if the pattern has not 
yet occurred by some time 7 then the additional time until it occurs tends to be 
less than the time it would take the pattern to occur if one started all over at 
that point. Such a random variable is known to satisfy (see Proposition 9.6.1 of 
Ref. [4]) 


Var(T) < E*[T] — E[T] < E’[T] r 


Now, suppose that there are s patterns, A(1),..., A(s) and that we are inter- 
ested in the mean time until one of these patterns occurs, as well as the probability 
mass function of the one that occurs first. Let us assume, without any loss of gen- 
erality, that none of the patterns is contained in any of the others. (That is, we rule 


472 Renewal Theory and Its Applications 


out such trivial cases as A(1) =h, h and A(2) =h,h, t.) To determine the quan- 
tities of interest, let T(i) denote the time until pattern A(i) occurs, i = 1,...,s, 
and let T(i,/) denote the additional time, starting with the occurrence of pattern 
A(i), until pattern A(j) occurs, i 4 j. Start by computing the expected values of 
these random variables. We have already shown how to compute E[T()], i = 
1,...,s. To compute E[T(i,/)], use the same approach, taking into account any 
“overlap” between the latter part of A(i) and the beginning part of A(j). For 
instance, suppose A(1) = 0, 0, 1, 2,0, 3, and A(2) = 2,0, 3, 2,0. Then 


T(2) = T2093 + TQ, 2) 
where T»,0,3 is the time to obtain the pattern 2, 0, 3. Hence, 
E[T(1,2)] = E[TQ)] — E[T2,0,3] 
= (p3p9p3) | + (op)! — (2pops)! 


So, suppose now that all of the quantities E[T(/)] and E[T(i, j)] have been com- 
puted. Let 


M = min T(Z) 


and let 
PGS PIM ESTO. FH os 


That is, P(i) is the probability that pattern A(J) is the first pattern to occur. Now, 
for each j we will derive an equation that E[T(j)] satisfies as follows: 


E(T(j)] = E[M] + E[T() — M] 


SEM) 4-) FTEPlPO, 7= Less (7.46) 
iniFj 


where the final equality is obtained by conditioning on which pattern occurs first. 
But Equations (7.46) along with the equation 


yo Pw =1 
i=1 


constitute a set of s + 1 equations in the s + 1 unknowns E[M], P(i),i=1,...,s. 
Solving them yields the desired quantities. 
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Example 7.33 Suppose that we continually flip a fair coin. With A(1) = A, t,t, 
h,h and A(2) =h,h,t,h,t, we have 


E[T(1)] = 32 + E[T)] = 34, 

E[T(2)] = 32, 
E(T(1,2)] = E[T(2)] — E[Thn] = 32 — (4 + E[Tn)) = 26, 
E{T (2, 1)] = E[T(1)] — E[Tny] = 34 — 4 = 30 


Hence, we need, solve the equations 


34 = E[M] + 30P(2), 
32 = E[M] + 26P(1), 
1 = P(1) + P(2) 


These equations are easily solved, and yield the values 
P(1) =P()=4, EIM)=19 


Note that although the mean time for pattern A(2) is less than that for A(1), each 
has the same chance of occurring first. a 


Equations (7.46) are easy to solve when there are no overlaps in any of the 
patterns. In this case, for all i 4 j 


E[TG,7)] = E[T@Q)] 
so Equations (7.46) reduce to 

E[TG)] = E[M] + (1 — PG))E[ITG)] 
or 

P(j) = E[M]/E[T()] 
Summing the preceding over all j yields 


1 
ne 7.47 
wo” See ETT] ng 


1/E[T()] 


1 ————————— 
vt Fe AFIT OD 


(7.48) 


In our next example we use the preceding to reanalyze the model of Example 7.7. 
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Example 7.34 Suppose that each play of a game is, independently of the 
outcomes of previous plays, won by player i with probability pj, i = 1,...,s. 
Suppose further that there are specified numbers 1(1), ...,7(s) such that the first 
player i to win n(i) consecutive plays is declared the winner of the match. Find 
the expected number of plays until there is a winner, and also the probability that 
the winner is 7,7=1,...,s. 


Solution: Letting A(i), for i = 1,...,s, denote the pattern of mj; consecutive 
values of i, this problem asks for P(i), the probability that pattern A(é) occurs 


first, and for E[M]. Because 

; ng 1 = pr 
E(T@) = A/pi)" + A/py"O* + +--+ 1/pi = Gat 
D; (hs pi) 


we obtain, from Equations (7.47) and (7.48), that 
_ 1 
Dj=1[P; ? 1 — p)/(1— 27°”) 
pa — pa/(1— pF) 
De leF a — pp/(1 - 7”) 


Pi) = 


7.9.2 The Expected Time to a Maximal Run of Distinct Values 


Let X;, i > 1, be independent and identically distributed random variables that 
are equally likely to take on any of the values 1,2,...,72. Suppose that these 
random variables are observed sequentially, and let T denote the first time that 
a run of m consecutive values includes all the values 1,..., 77. That is, 


T = min{2: Xp_m41,---, Xn are all distinct} 


To compute E[T], define a renewal process by letting the first renewal occur at 
time T. At this point start over and, without using any of the data values up to T, 
let the next renewal occur the next time a run of m consecutive values are all 
distinct, and so on. For instance, if 71 = 3 and the data are 


1,3:30919. 3 8x (7.49) 


then there are two renewals by time 10, with the renewals occurring at times 
5 and 9. We call the sequence of m distinct values that constitutes a renewal a 
renewal run. 

Let us now transform the renewal process into a delayed renewal reward pro- 
cess by supposing that a reward of 1 is earned at time n, n > m, if the values 
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Xn—-mt1s+++5 Xn are all distinct. That is, a reward is earned each time the previ- 
ous m data values are all distinct. For instance, if 7 = 3 and the data values are 
as in (7.49) then unit rewards are earned at times 5, 7, 9, and 10. If we let R; 
denote the reward earned at time i, then by Proposition 7.3, 


: ETD Ri] _ EIR] 
lim 7 — AT (7.50) 


where R is the reward earned between renewal epochs. Now, with A; equal to 
the set of the first 7 data values of a renewal run, and B; to the set of the first 7 
values following this renewal run, we have the following: 


m—1 
E[R] =1+ pS E[reward earned a time / after a renewal] 
i=1 
m—1 
=1+ )) P{A; = Bi) 
i=1 


-y = (7.51) 
Hence, since for i > m 


! 
E[R;] = P{Xi-m4i,-..,X; are all distinct} = —— 
m 


it follows from Equation (7.50) that 


m! ER] 


m" ELT] 


Thus, from Equation (7.51) we obtain 


The preceding delayed renewal reward process approach also gives us another 
way of computing the expected time until a specified pattern appears. We illus- 
trate by the following example. 


476 Renewal Theory and Its Applications 


Example 7.35 Compute E[T], the expected time until the pattern h, A, h, t, h, 
h, h appears, when a coin that comes up heads with probability p and tails with 
probability g = 1 — p is continually flipped. 
Solution: Define a renewal process by letting the first renewal occur when 
the pattern first appears, and then start over. Also, say that a reward of 1 


is earned whenever the pattern appears. If R is the reward earned between 
renewal epochs, we have 


6 
E[R]=1+ sy E[reward earned i units after a renewal] 
i=1 


=14+0404+0+4+ p'¢+ pap + pap’ 


Hence, since the expected reward earned at time i is E[R;] = p°q, we obtain 
the following from the renewal reward theorem: 


1+9p>+qp' +p _ 
EIT] 


6 


or 


E(T]=q"'p +p 3+p? +p? a 


7.9.3 Increasing Runs of Continuous Random Variables 


Let X1, X2,... be a sequence of independent and identically distributed contin- 
uous random variables, and let T denote the first time that there is a string of r 
consecutive increasing values. That is, 


T=min{n >r: Xp. < Xn-p42 < +++ < Xp} 


To compute E[T], define a renewal process as follows. Let the first renewal occur 
at T. Then, using only the data values after T, say that the next renewal occurs 
when there is again a string of r consecutive increasing values, and continue in 
this fashion. For instance, if r = 3 and the first 15 data values are 


12,20, 22,28, 43, 18, 24, 33, 60,4, 16, 8, 12, 15,18 


then 3 renewals would have occurred by time 15, namely, at times 3, 8, and 14. 
If we let N() denote the number of renewals by time n, then by the elementary 
renewal theorem 


E[N(a)] R 1 
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To compute E[N(z)], define a stochastic process whose state at time k, call it 
Sp, is equal to the number of consecutive increasing values at time k. That is, 
forl<j<k 


Sp =] if Xp; > Xpj-4 5 oS Xp-1 < Xp 


where Xo = oo. Note that a renewal will occur at time k if and only if S, = ir 
for some i > 1. For instance, if r = 3 and 


X5 > X6 < X7 < Xg < Xo < Xin < XY 
then 
S6=1, 8742; Sg=3;, Sees, So Hs, Sip 6 
and renewals occur at times 8 and 11. Now, for k > j 
P{Sp = j} = P{Xp_j > Xpjpr < +++ < Xg-1 < Xp} 
= P{Xp_jga <0 < Xp-1 < Xzh 


— P{Xp_j < Xpjpa < +++ < Xp < Xe} 
1 1 


wg Get! 
whee 
+1)! 


where the next to last equality follows since all possible orderings of the random 
variables are equally likely. 
From the preceding, we see that 


lee) lee) 
lim P | ttime k}= li P{S,= ir} = 
hye {a renewal occurs at time k} ie 2 {S,= ir} dX 


ir 
(ir + 1)! 
However, 


n 


E[N(n)] = > P{a renewal occurs at time k} 
k=1 


Because we can show that for any numbers a,, k > 1, for which limg_, ,, az exists 
that 
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we obtain from the preceding, upon using the elementary renewal theorem, 


1 
ey ir/ Gr + 1)! 


E[T] = 


7.10 The Insurance Ruin Problem 


Suppose that claims are made to an insurance firm according to a Poisson process 
with rate A, and that the successive claim amounts Yj, Y2,... are independent 
random variables having a common distribution function F with density f(x). 
Suppose also that the claim amounts are independent of the claim arrival times. 
Thus, if we let M(t) be the number of claims made by time ¢, then ye Y; is 
the total amount paid out in claims by time t. Supposing that the firm starts with 
an initial capital x and receives income at a constant rate c per unit time, we are 
interested in the probability that the firm’s net capital ever becomes negative; that 
is, we are interested in 


MQ) 
R(x) =|) Y; > x + ct for some t > of 
i=1 


If the firm’s capital ever becomes negative, we say that the firm is ruined; thus 
R(x) is the probability of ruin given that the firm begins with an initial capital x. 

Let « = E[Y;] be the mean claim amount, and let o = A/c. Because claims 
occur at rate 4, the long-run rate at which money is paid out is Aw. (A formal 
argument uses renewal reward processes. A new cycle begins when a claim occurs; 
the cost for the cycle is the claim amount, and so the long-run average cost is jz, the 
expected cost incurred in a cycle, divided by 1/4, the mean cycle time.) Because 
the rate at which money is received is c, it is clear that R(x) = 1 when p > 1. As 
R(x) can be shown to also equal 1 when p = 1 (think of the recurrence of the 
symmetric random walk), we will suppose that p < 1. 

To determine R(x), we start by deriving a differential equation. To begin, con- 
sider what can happen in the first / time units, where h is small. With probability 
1 — Ah + o(h) there will be no claims and the firm’s capital at time / will be 
x + ch; with probability 44 + o(h) there will be exactly one claim and the firm’s 
capital at time A will be x + ch — Y1; with probability o(h) there will be two 
or more claims. Therefore, conditioning on what happens during the first h time 
units yields 


R(x) = (1 — AA)R( + ch) + AH E[R(x + ch — Y1)] + o(h) 
Equivalently, 


R(x + ch) — R(x) = Ah R(x + ch) — AH E[R(x + ch — Y1)] + o(h) 
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Dividing through by ch gives 


ROE “ Rot = R(x + ch) - “EIR + ch—Yi)] + = —— 


Letting / go to 0 yields the differential equation 


Xr Xr 
R(x) = 7 RO) ERG Yi) 


Because R(u) = 1 when uw < 0, the preceding can be written as 
Xr rn [* rx [&% 
Ra) = 2R«) 2 [Re -vfondy-* [foray 
Cc c JO C Ux 
or, equivalently, 
' A a [* Ie 
RG) =2R@)- = [Re yfondy—* Fen (7.52) 


where F(x) = 1 — F(x). 
We will now use the preceding equation to show that R(x) also satisfies the 
equation 


R@) = RO) +~ / Re yFO) dy ~ * / Foi SO 1753) 
0 0 


To verify Equation (7.53), we will show that differentiating both sides of it results 
in Equation (7.52). (It can be shown that both (7.52) and (7.53) have unique 
solutions.) To do so, we will need the following lemma, whose proof is given at 
the end of this section. 


Lemma 7.5 For a function k, and a differentiable function f, 


d 4 x 
i i (x — y)k(y) dy = t(0)R(e) + i te — yk(y) dy 
x JO 0 


Differentiating both sides of Equation (7.53) gives, upon using the preceding 
lemma, 


ny zs x zs = 
R(x) = *| OF) + i R'(x — y)F(y) dy — Fo | (7.54) 
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Differentiation by parts [u = F(y), dv = R'(x — y) dy] shows that 
[ R'(x — y)F(y) dy = —FQ)R(& — y)I§ — i R(x — y)f 0) dy 


= —F(x)R(0) + RO) — ip ROE—DFO) AY 


Substituting this result back in Equation (7.54) gives Equation (7.52). Thus, we 
have established Equation (7.53). 

To obtain a more usable expression for R(x), consider a renewal process whose 
interarrival times X1, X2,... are distributed according to the equilibrium distri- 
bution of F. That is, the density function of the X; is 


E 
Aosree 
bw 


Let N(t) denote the number of renewals by time t, and let us derive an expres- 
sion for 


q(x) = E[pN@*") 


Conditioning on X gives 


00 F 
q(x) = ) E| pen =| dy 


Because, given that X, = y, the number of renewals by time x is distributed as 
1 + N(x — y) when y < x, or is identically 0 when y > x, we see that 


Pepe wr tle abe x 


E[pX*1 |X, = y] = | ? 
Ps ry > x 


Therefore, g(x) satisfies 


x F oo Bp 
qc) = f pq — > dy + p | EOD By 
0 a2 x UL 


ay x _ x CO x 
== / ax —y Fon dy + >| | Foydy— [ Fo) dy] 
c Jo cLJO 0 


a - d [*= 
= a ce — 9) Fo) dy + p—* | F(y) dy 
c Jo Cc JO 
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Because q(0) = p, this is exactly the same equation that is satisfied by R(x), namely 
Equation (7.53). Therefore, because the solution to (7.53) is unique, we obtain 
the following. 


Proposition 7.6 

R(x) = qx) = E[pN@*")] 
Example 7.36 Suppose that the firm does not start with any initial capital. Then, 
because N(0) = 0, we see that the firm’s probability of ruin is R(O) = p. | 


Example 7.37 If the claim distribution F is exponential with mean jz, then so is 
F,. Hence, N(x) is Poisson with mean x/j, giving the result 


CO 


R(x) = E[pNOH] = D0 ptt be!" e/u)"/n! 
n=0 


= pe “(ox/p)"/n! 


n=0 


= pe *(-p)/u | 


To obtain some intuition about the ruin probability, let T be independent of 
the interarrival times X; of the renewal process having interarrival distribution 
F,, and let T have probability mass function 


P{T=n}=p"(1—-p), n=0,1,... 


Now consider PID E, Xi > fs the probability that the sum of the first T of 


the X; exceeds x. Because N(x) + 1 is the first renewal that occurs after time x, 
we have 


N(x) +1 = min} : yee > | 
coal 


Therefore, conditioning on the number of renewals by time x gives 


T ee) T 
Pyyox, > x| = Pox Ss x{N(x) = i}PINe» = 
i=1 


j=0 “i=l 


= DU P{T > f+ 1N(m) = A}P{N(x) = j} 
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= Mer >ji+ 1}P{N(x) = 7} 


= Do PING) = 3} 
*. E[pNO+1] 


Consequently, P{ Sy Xj > x} is equal to the ruin probability. Now, as noted 
in Example 7.36, the ruin probability of a firm starting with 0 initial capital is p. 
Suppose that the firm starts with an initial capital x, and suppose for the moment 
that it is allowed to remain in business even if its capital becomes negative. Because 
the probability that the firm’s capital ever falls below its initial starting amount x 
is the same as the probability that its capital ever becomes negative when it starts 
with 0, this probability is also o. Thus, if we say that a low occurs whenever the 
firm’s capital becomes lower than it has ever previously been, then the probability 
that a low ever occurs is p. Now, if a low does occur, then the probability that 
there will be another low is the probability that the firm’s capital will ever fall 
below its previous low, and clearly this is also . Therefore, each new low is 
the final one with probability 1 — p. Consequently, the total number of lows 
that ever occur has the same distribution as T. In addition, if we let W; be the 
amount by which the ith low is less than the low preceding it, it is easy to see that 
W1, W2,... are independent and identically distributed, and are also independent 
of the number of lows. Because the minimal value over all time of the firm’s capital 
(when it is allowed to remain in business even when its capital becomes negative) 
is x — 7, W,, it follows that the ruin probability of a firm that starts with an 
initial capital x is 


T 
R(x) = pS W; > x| 
i=1 


Because 
T 
R(x) = E[pN™+!] = Pox > | 
i=1 


we can identify W; with X;. That is, we can conclude that each new low is lower 
than its predecessor by a random amount whose distribution is the equilibrium 
distribution of a claim amount. 


Remark Because the times between successive customer claims are independent 
exponential random variables with mean 1/4 while money is being paid to the 
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insurance firm at a constant rate c, it follows that the amounts of money paid in 
to the insurance company between consecutive claims are independent exponen- 
tial random variables with mean c/A. Thus, because ruin can only occur when a 
claim arises, it follows that the expression given in Proposition 7.6 for the ruin 
probability R(x) is valid for any model in which the amounts of money paid 
to the insurance firm between claims are independent exponential random vari- 
ables with mean c/d and the amounts of the successive claims are independent 
random variables having distribution function F, with these two processes being 
independent. 

Now imagine an insurance model in which customers buy policies at arbitrary 
times, each customer pays the insurance company a fixed rate c per unit time, the 
time until a customer makes a claim is exponential with rate 4, and each claim 
amount has distribution F. Consider the amount of money the insurance firm 
takes in between claims. Specifically, suppose a claim has just occurred and let X 
be the amount the insurance company takes in before another claim arises. Note 
that this amount increases continuously in time until a claim occurs, and suppose 
that at the present time the amount ¢ has been taken in since the last claim. 
Let us compute the probability that a claim will be made before the amount 
taken in increases by an additional amount h, when h is small. To determine this 
probability, suppose that at the present time the firm has k customers. Because 
each of these k customers is paying the insurance firm at rate c, it follows that the 
additional amount taken in by the firm before the next claim occurs will be less 
than A if and only if a claim is made within the next LE time units. Because each of 
the k customers will register a claim at an exponential rate 4, the time until one of 
them makes a claim is an exponential random variable with rate kA. Calling this 
random variable E,,, it follows that the probability that the additional amount 
taken in is less than h is 


P(additional amount < h|k customers) = (Ee < x) 
c 


ae e thle 


Thus, 
xr 
P(X <t+h|X >t) = “h+o(h) 
c 


showing that the failure rate function of X is identically A. But this means that the 
amounts taken in between claims are exponential random variables with mean 
£. Because the amounts of each claim have distribution function F, we can thus 
conclude that the firm’s failure probability in this insurance model is exactly the 
same as in the previously analyzed classical model. a 
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Let us now give the proof of Lemma 7.5. 


Proof of Lemma 7.5. Let G(x) = {5 t(x — y)k(y) dy. Then 
G(x + h) — G(x) = G(x +h) - i, t(x +h—y)k(y) dy 
0 
+ dh t(x +h — y)k(y) dy — G(x) 
xth 
= i: Hx +h —y)k(y) dy 


+f I [tx + h —y) — tx — yk) dy 
Dividing through by h gives 


Gx+h)—Ge) 1 


xth 
/ Hoe +h — y)R(y) dy 


h h 
x any a 
EP / t(x + t(x — y) Koay 
0 
Letting h — 0 gives the result 
G' (oe) = 1(0) k(x) + i 1'(x — y) k(y) dy x 


Exercises 


1. Is it true that 
(a) N(t) < nif and only if S, > 2? 
(b) N(t) < nif and only if S, > 2? 
(c) N(t) > nif and only if S, < t? 
2. Suppose that the interarrival distribution for a renewal process is Poisson distributed 
with mean yj. That is, suppose 


r 

P{Xn =k} =e", k=0,1,... 
(a) Find the distribution of Sy. 
(b) Calculate P{N(t) = }. 


*3. If the mean-value function of the renewal process {N(t), t > 0} is given by m(t) = 
t/2, t > 0, what is P{N(5) = 0}? 
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4. 


*6, 


*8. 


Let {Nj (¢), t > 0} and {N2(t), ¢ > 0} be independent renewal processes. Let N(t) = 
N,(t) + No(t). 

(a) Are the interarrival times of {N(¢), t > 0} independent? 

(b) Are they identically distributed? 

(c) Is {N(¢), t > 0} a renewal process? 


Let Uj, U2,... be independent uniform (0, 1) random variables, and define N by 
N=min{n: Uy + U2+---+ U, > 1} 


What is E[N]? 


Consider a renewal process {N(t), t > 0} having a gamma (r, A) interarrival distri- 
bution. That is, the interarrival density is 


rewr* (Ax) 


(r-1)! 7 


x>0 


f(x) = 


(a) Show that 


Secd ent i 
PING > m= “ 


(b) Show that 


(oe) 


: —At 3. i 
m(t) = Yj} 


where [i/7] is the largest integer less than or equal to i/r. 


Hint: Use the relationship between the gamma (r, A) distribution and the sum of r 
independent exponentials with rate 4 to define N(t) in terms of a Poisson process 
with rate A. 


Mr. Smith works on a temporary basis. The mean length of each job he gets is three 
months. If the amount of time he spends between jobs is exponentially distributed 
with mean 2, then at what rate does Mr. Smith get new jobs? 


A machine in use is replaced by a new machine either when it fails or when it reaches 
the age of T years. If the lifetimes of successive machines are independent with a 
common distribution F having density f, show that 

(a) the long-run rate at which machines are replaced equals 


T -1 
| xf(x)dx + TA - ry 
0 


(b) the long-run rate at which machines in use fail equals 


F(T) 
Jo. xf &) dx + TI — F(T)] 
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10. 


11. 


12. 


13. 


A worker sequentially works on jobs. Each time a job is completed, a new one is 
begun. Each job, independently, takes a random amount of time having distribution 
F to complete. However, independently of this, shocks occur according to a Poisson 
process with rate 4. Whenever a shock occurs, the worker discontinues working 
on the present job and starts a new one. In the long run, at what rate are jobs 
completed? 


Consider a renewal process with mean interarrival time jz. Suppose that each event 
of this process is independently “counted” with probability p. Let Nc(t) denote the 
number of counted events by time t, t > 0. 

(a) Is Nc(t), ¢ > 0a renewal process? 

(b) What is limz+o0 Nc(t)/t? 


A renewal process for which the time until the initial renewal has a different dis- 
tribution than the remaining interarrival times is called a delayed (or a general) 
renewal process. Prove that Proposition 7.1 remains valid for a delayed renewal 
process. (In general, it can be shown that all of the limit theorems for a renewal 
process remain valid for a delayed renewal process provided that the time until the 
first renewal has a finite mean.) 


Events occur according to a Poisson process with rate A. Any event that occurs 
within a time d of the event that immediately preceded it is called a d-event. For 
instance, if d = 1 and events occur at times 2, 2.8,4,6,6.6,..., then the events at 
times 2.8 and 6.6 would be d-events. 

(a) At what rate do d-events occur? 

(b) What proportion of all events are d-events? 


Let X1, X2,... be a sequence of independent random variables. The nonnegative 
integer valued random variable N is said to be a stopping time for the sequence if 
the event {N = 7} is independent of X,41,Xn42,...- The idea being that the X; 
are observed one at a time—first X;, then X2, and so on—and N represents the 
number observed when we stop. Hence, the event {N = 7} corresponds to stopping 
after having observed X1,..., X, and thus must be independent of the values of 
random variables yet to come, namely, X41, Xn42,---- 

(a) Let X1,X2,... be independent with 


P{X; = 1} =p =1—- P{X; = 9}, i>i1 
Define 
Ny, = min{a: X1+---+ Xn = 5} 


é if X,; =0 
No = 


S. apt 
ne af ifXa=0 
aaa Oats ooo 


Which of the N; are stopping times for the sequence X1,...? An important 
result, known as Wald’s equation states that if X1, X2,... are independent and 
identically distributed and have a finite mean E(X), and if N is a stopping time 


Exercises 487 


14. 


for this sequence having a finite mean, then 
N 
aby xi] = E[N]E[X] 
i=l 


To prove Wald’s equation, let us define the indicator variables I;,i > 1 by 


eo (i ifi<N 
(b) Show that 


N fore) 
SG Gh 
i=1 i=1 


From part (b) we see that 


[x] -*[ ox] 


where the last equality assumes that the expectation can be brought inside the 
summation (as indeed can be rigorously proven in this case). 
(c) Argue that X; and I; are independent. 


Hint: JI; equals 0 or 1 depending on whether or not we have yet stopped after 
observing which random variables? 


(d) From part (c) we have 


N oe) 
el xi] =) ELXIEL| 
i=1 


i=1 
Complete the proof of Wald’s equation. 
(e) What does Wald’s equation tell us about the stopping times in part (a)? 


Wald’s equation can be used as the basis of a proof of the elementary renewal 
theorem. Let X1, X2,... denote the interarrival times of a renewal process and let 
N(t) be the number of renewals by time t. 

(a) Show that whereas N(£) is not a stopping time, N(t) + 1 is. 


Hint: Note that 
N@)=n @& Xi t+---+Xy,<t and X,4+---+ Xu. >t 
(b) Argue that 


N()+1 


e| PS xi] = p[m(t) + 1] 
i=1 
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1S. 


16. 


17. 


*18. 


(c) Suppose that the X; are bounded random variables. That is, suppose there is 
a constant M such that P{X; < M} = 1. Argue that 


N(@)+1 
t< yo Xj<t+M 
i=1 


(d) Use the previous parts to prove the elementary renewal theorem when the 
interarrival times are bounded. 


Consider a miner trapped in a room that contains three doors. Door 1 leads him to 

freedom after two days of travel; door 2 returns him to his room after a four-day 

journey; and door 3 returns him to his room after a six-day journey. Suppose at 

all times he is equally likely to choose any of the three doors, and let T denote the 

time it takes the miner to become free. 

(a) Define a sequence of independent and identically distributed random variables 
X 1, X2... anda stopping time N such that 


Note: You may have to imagine that the miner continues to randomly choose 
doors even after he reaches safety. 


(b) Use Wald’s equation to find E[T]. 

(c) Compute E pa X;|N = n| and note that it is not equal to E[}77_4 Xj]. 

(d) Use part (c) for a second derivation of E[T]. 

A deck of 52 playing cards is shuffled and the cards are then turned face up one at 
a time. Let X; equal 1 if the ith card turned over is an ace, and let it be 0 otherwise, 
i=1,...,52. Also, let N denote the number of cards that need be turned over until 
all four aces appear. That is, the final ace appears on the Nth card to be turned 
over. Is the equation 


N 
ely xi] = E[N]E[X,] 
i=1 


valid? If not, why is Wald’s equation not applicable? 

In Example 7.6, suppose that potential customers arrive in accordance with a 
renewal process having interarrival distribution F. Would the number of events 
by time ¢ constitute a (possibly delayed) renewal process if an event corresponds to 
a customer 

(a) entering the bank? 

(b) leaving the bank? 

What if F were exponential? 


Compute the renewal function when the interarrival distribution F is such that 


1— F(t) =pe"" + (1—pye 
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19. 


20. 


21, 


*22, 


23. 


2A, 


25. 


For the renewal process whose interarrival times are uniformly distributed over 
(0, 1), determine the expected time from t = 1 until the next renewal. 


For a renewal reward process consider 


_ Ry +Ro4+---+ Ry 
Xy t+ X72 +--+ + Xn 


n 


where W,, represents the average reward earned during the first 7 cycles. Show that 
Wr — E[R]/E[X] asn > o. 

Consider a single-server bank for which customers arrive in accordance with a Pois- 
son process with rate 4. If a customer will enter the bank only if the server is free 
when he arrives, and if the service time of a customer has the distribution G, then 
what proportion of time is the server busy? 


The lifetime of a car has a distribution H and probability density h. Ms. Jones buys 
a new car as soon as her old car either breaks down or reaches the age of T years. A 
new car costs C; dollars and an additional cost of C> dollars is incurred whenever a 
car breaks down. Assuming that a T-year-old car in working order has an expected 
resale value R(T), what is Ms. Jones’ long-run average cost? 


Consider the gambler’s ruin problem where on each bet the gambler either wins 1 
with probability p or loses 1 with probability 1 — p. The gambler will continue to 
play until his winnings are either N — i or —i. (That is, starting with i the gambler 
will quit when his fortune reaches either N or 0.) Let T denote the number of 
bets made before the gambler stops. Use Wald’s equation, along with the known 
probability that the gambler’s final winnings are N — i, to find E[T]. 


Hint: Let X; be the gambler’s winnings on bet j,j > 1. What are the possible values 
of D/h1 Xj? What is E[ D7 X;]? 


Wald’s equation can also be proved by using renewal reward processes. Let N be a 
stopping time for the sequence of independent and identically distributed random 
variables X;,i > 1. 

(a) Let Ny = N. Argue that the sequence of random variables Xn,41, XN,+25--- 
is independent of X1,...,XN and has the same distribution as the original 
sequence Xj,7 > 1. 

Now treat Xn, 41, XN, +2,--- aS a new sequence, and define a stopping time 
No for this sequence that is defined exactly as Nj is on the original sequence. 
(For instance, if Ny = min{z: X, > O}, then No = min{n: Xn,4n > O}.) 
Similarly, define a stopping time N3 on the sequence Xn, +N, 415 XNj+N)+29--- 
that is identically defined on this sequence as Nj is on the original sequence, 
and so on. 

(b) Is the reward process in which X; is the reward earned during period ia renewal 
reward process? If so, what is the length of the successive cycles? 

(c) Derive an expression for the average reward per unit time. 

(d) Use the strong law of large numbers to derive a second expression for the 
average reward per unit time. 

(e) Conclude Wald’s equation. 


Suppose in Example 7.13 that the arrival process is a Poisson process and suppose 
that the policy employed is to dispatch the train every ¢ time units. 
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26. 


2h. 


28. 
29. 


(a) Determine the average cost per unit time. 

(b) Show that the minimal average cost per unit time for such a policy is approx- 
imately c/2 plus the average cost per unit time for the best policy of the type 
considered in that example. 


Consider a train station to which customers arrive in accordance with a Poisson 
process having rate 4. A train is summoned whenever there are N customers waiting 
in the station, but it takes K units of time for the train to arrive at the station. When 
it arrives, it picks up all waiting customers. Assuming that the train station incurs 
a cost at a rate of nc per unit time whenever there are 7 customers present, find the 
long-run average cost. 


A machine consists of two independent components, the ith of which functions for 
an exponential time with rate 4;. The machine functions as long as at least one of 
these components function. (That is, it fails when both components have failed.) 
When a machine fails, a new machine having both its components working is put 
into use. A cost K is incurred whenever a machine failure occurs; operating costs 
at rate cj per unit time are incurred whenever the machine in use has i working 
components, i = 1,2. Find the long-run average cost per unit time. 


In Example 7.15, what proportion of the defective items produced is discovered? 


Consider a single-server queueing system in which customers arrive in accordance 
with a renewal process. Each customer brings in a random amount of work, chosen 
independently according to the distribution G. The server serves one customer at a 
time. However, the server processes work at rate i per unit time whenever there are i 
customers in the system. For instance, if a customer with workload 8 enters service 
when there are three other customers waiting in line, then if no one else arrives 
that customer will spend 2 units of time in service. If another customer arrives after 
1 unit of time, then our customer will spend a total of 1.8 units of time in service 
provided no one else arrives. 

Let W; denote the amount of time customer / spends in the system. Also, define 
E[W] by 


E(W] = lim (W1 + +--+ Wn)/n 


and so E[W] is the average amount of time a customer spends in the system. 
Let N denote the number of customers that arrive in a busy period. 
(a) Argue that 


E(W] = E[W, +--- + WnJ/E[N] 


Let L; denote the amount of work customer i brings into the system; and so 
the L;, i > 1, are independent random variables having distribution G. 

(b) Argue that at any time t, the sum of the times spent in the system by all arrivals 
prior to ¢ is equal to the total amount of work processed by time f. 


Hint: Consider the rate at which the server processes work. 
(c) Argue that 


N N 
yma ry 
i=1 i=1 
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*30. 


31. 


32% 
33. 
34. 


ace 


36. 


(d) Use Wald’s equation (see Exercise 13) to conclude that 
E[W] = pu 


where yw is the mean of the distribution G. That is, the average time that 
customers spend in the system is equal to the average work they bring to the 
system. 


For a renewal process, let A(t) be the age at time f. Prove that if uw < oo, then with 
probability 1 


A(t) 


mgs coe ast —> oo 


If A(t) and Y(t) are, respectively, the age and the excess at time t of a renewal 
process having an interarrival distribution F, calculate 


P{Y(t) > x|A() = s} 


Determine the long-run proportion of time that Xn(y41 < c. 
In Example 7.14, find the long-run proportion of time that the server is busy. 


An M/G/oo queueing system is cleaned at the fixed times T, 2T, 3T,.... All cus- 
tomers in service when a cleaning begins are forced to leave early and a cost C is 
incurred for each customer. Suppose that a cleaning takes time T/4, and that all 
customers who arrive while the system is being cleaned are lost, and a cost C2 is 
incurred for each one. 

(a) Find the long-run average cost per unit time. 

(b) Find the long-run proportion of time the system is being cleaned. 


Satellites are launched according to a Poisson process with rate 4. Each satellite 
will, independently, orbit the earth for a random time having distribution F. Let 
X(t) denote the number of satellites orbiting at time t. 

(a) Determine P{X(t) = k}. 

Hint: Relate this to the M/G/oo queue. 


(b) If at least one satellite is orbiting, then messages can be transmitted and we 
say that the system is functional. If the first satellite is orbited at time t = 0, 
determine the expected time that the system remains functional. 


Hint: Make use of part (a) when k = 0. 


Each of 1 skiers continually, and independently, climbs up and then skis down a 

particular slope. The time it takes skier i to climb up has distribution Fj, and it is 

independent of her time to ski down, which has distribution H;, i = 1,..., 1. Let 

N@) denote the total number of times members of this group have skied down the 

slope by time ¢. Also, let U(t) denote the number of skiers climbing up the hill at 

time ¢. 

(a) What is limy+.. N(t)/t? 

(b) Find limpsoo E[UQ)]. 

(c) If all F; are exponential with rate 4 and all G; are exponential with rate pu, 
what is P{U(t) = k}? 
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37. 


38. 


39. 


40. 


41. 


*42. 


There are three machines, all of which are needed for a system to work. Machine 

i functions for an exponential time with rate A; before it fails, i = 1,2,3. When 

a machine fails, the system is shut down and repair begins on the failed machine. 

The time to fix machine 1 is exponential with rate 5; the time to fix machine 2 is 

uniform on (0, 4); and the time to fix machine 3 is a gamma random variable with 

parameters 7 = 3 and A = 2. Once a failed machine is repaired, it is as good as new 

and all machines are restarted. 

(a) What proportion of time is the system working? 

(b) What proportion of time is machine 1 being repaired? 

(c) What proportion of time is machine 2 in a state of suspended animation (that 
is, neither working nor being repaired)? 

A truck driver regularly drives round trips from A to B and then back to A. Each 

time he drives from A to B, he drives at a fixed speed that (in miles per hour) is 

uniformly distributed between 40 and 60; each time he drives from B to A, he drives 

at a fixed speed that is equally likely to be either 40 or 60. 

(a) In the long run, what proportion of his driving time is spent going to B? 

(b) Inthe long run, for what proportion of his driving time is he driving at a speed 
of 40 miles per hour? 


A system consists of two independent machines that each function for an exponen- 
tial time with rate A. There is a single repairperson. If the repairperson is idle when 
a machine fails, then repair immediately begins on that machine; if the repairperson 
is busy when a machine fails, then that machine must wait until the other machine 
has been repaired. All repair times are independent with distribution function G 
and, once repaired, a machine is as good as new. What proportion of time is the 
repairperson idle? 


Three marksmen take turns shooting at a target. Marksman 1 shoots until he misses, 
then marksman 2 begins shooting until he misses, then marksman 3 until he misses, 
and then back to marksman 1, and so on. Each time marksman i fires he hits the 
target, independently of the past, with probability Pj, i = 1,2,3. Determine the 
proportion of time, in the long run, that each marksman shoots. 

Each time a certain machine breaks down it is replaced by a new one of the same 
type. In the long run, what percentage of time is the machine in use less than one 
year old if the life distribution of a machine is 

(a) uniformly distributed over (0, 2)? 

(b) exponentially distributed with mean 1? 


For an interarrival distribution F having mean jz, we defined the equilibrium dis- 
tribution of F, denoted F., by 


1 as 
Ores / [1 — FO] dy 
KL JO 


(a) Show that if F is an exponential distribution, then F = F,. 
(b) If for some constant c, 


0, x<c 
1, x2c 


F(x) = 
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44, 


4S. 


46. 


47. 


show that Fe is the uniform distribution on (0, c). That is, if interarrival times 
are identically equal to c, then the equilibrium distribution is the uniform 
distribution on the interval (0, c). 

(c) The city of Berkeley, California, allows for two hours parking at all non- 
metered locations within one mile of the University of California. Parking 
officials regularly tour around, passing the same point every two hours. When 
an official encounters a car he or she marks it with chalk. If the same car is 
there on the official’s return two hours later, then a parking ticket is written. 
If you park your car in Berkeley and return after three hours, what is the 
probability you will have received a ticket? 


Consider a renewal process having interarrival distribution F such that 
F(x) = xe% + pe, x>0 


That is, interarrivals are equally likely to be exponential with mean 1 or exponential 
with mean 2. 

(a) Without any calculations, guess the equilibrium distribution Fe. 

(b) Verify your guess in part (a). 


An airport shuttle bus picks up all passengers waiting at a bus stop and drops them 

off at the airport terminal; it then returns to the stop and repeats the process. The 

times between returns to the stop are independent random variables with distribu- 

tion F, mean yw, and variance o~. Passengers arrive at the bus stop in accordance 

with a Poisson process with rate 4. Suppose the bus has just left the stop, and let X 

denote the number of passengers it picks up when it returns. 

(a) Find E[X]. 

(b) Find Var(X). 

(c) At what rate does the shuttle bus arrive at the terminal without any passengers? 
Suppose that each passenger that has to wait at the bus stop more than c time 

units writes an angry letter to the shuttle bus manager. 

(d) What proportion of passengers write angry letters? 

(e) How does your answer in part (d) relate to F,(x)? 

Consider a system that can be in either state 1 or 2 or 3. Each time the system 


enters state i it remains there for a random amount of time having mean jz; and 
then makes a transition into state j with probability P;;. Suppose 


Py =1, Pox =P23=4, P31 =1 


(a) What proportion of transitions takes the system into state 1? 
(b) If wy = 1, wz = 2, w3 = 3, then what proportion of time does the system 
spend in each state? 


Consider a semi-Markov process in which the amount of time that the process 
spends in each state before making a transition into a different state is exponentially 
distributed. What kind of process is this? 


In a semi-Markov process, let ¢;; denote the conditional expected time that the 

process spends in state i given that the next state is /. 

(a) Present an equation relating ju; to the tj. 

(b) Show that the proportion of time the process is in i and will next enter ; is 
equal to P;Pigtij/Mi- 
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48. 


*49, 


50. 


Si. 


52. 


Hint: Say that a cycle begins each time state i is entered. Imagine that you receive 
a reward at a rate of 1 per unit time whenever the process is in i and heading for /. 
What is the average reward per unit time? 


A taxi alternates between three different locations. Whenever it reaches location i, it 
stops and spends a random time having mean ¢; before obtaining another passenger, 
i = 1,2,3. A passenger entering the cab at location i will want to go to location j 
with probability P;;. The time to travel from i to j is a random variable with mean 


mj. Suppose that t) = 1, t2 = 2, t3 = 4, Piz = 1, Po3 = 1, P31 = § = 1 — P32, 
my = 10, m23 = 20, m3, = 15, m32 = 25. Define an appropriate semi-Markov 
process and determine 

(a) the proportion of time the taxi is waiting at location i, and 

(b) the proportion of time the taxi is on the road from i to j, i,j = 1,2, 3. 


Consider a renewal process having the gamma (x, A) interarrival distribution, and 
let Y() denote the time from ¢ until the next renewal. Use the theory of semi-Markov 
processes to show that 


i 
jim, P(Y(t) < x} = = yy Gi, (x) 


where G; (x) is the gamma (i, A) distribution function. 


To prove Equation (7.24), define the following notation: 


xi 


L 


time spent in state i on the jth visit to this state; 


N;(m) = number of visits to state i in the first transitions 


In terms of this notation, write expressions for 

(a) the amount of time during the first m transitions that the process is in state 4; 

(b) the proportion of time during the first 77 transitions that the process is in 
state 7. 

Argue that, with probability 1, 

Ni(m) x 


ic) Nim) 


> pj asm —> Co 


(d) NjQm)/m > n; as m—> OO. 

(e) Combine parts (a), (b), (c), and (d) to prove Equation (7.24). 

In 1984 the country of Morocco in an attempt to determine the average amount 
of time that tourists spend in that country on a visit tried two different sampling 
procedures. In one, they questioned randomly chosen tourists as they were leaving 
the country; in the other, they questioned randomly chosen guests at hotels. (Each 
tourist stayed at a hotel.) The average visiting time of the 3000 tourists chosen from 
hotels was 17.8, whereas the average visiting time of the 12,321 tourists questioned 
at departure was 9.0. Can you explain this discrepancy? Does it necessarily imply 
a mistake? 


Let X;,i=1,2,..., be the interarrival times of the renewal process {N(¢)}, and let 
Y, independent of the X;, be exponential with rate i. 
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(a) Use the lack of memory property of the exponential to argue that 
P(X, +--+ Xq_ < Y} = (P{X < Y})” 
(b) Use part (a) to show that 


Efe**] 


E[N(Y)] = T— Ele] 


where X has the interarrival distribution. 


53. Write a program to approximate m(¢) for the interarrival distribution F x G, where 
F is exponential with mean 1 and G is exponential with mean 3. 

54. Let Xj, i > 1, be independent random variables with pj = P{X = j}, 7 > 1. If 
pj = 7/10, j = 1,2, 3,4, find the expected time and the variance of the number of 
variables that need be observed until the pattern 1, 2, 3, 1, 2 occurs. 

55. A coin that comes up heads with probability 0.6 is continually flipped. Find the 
expected number of flips until either the sequence thht or the sequence ttt occurs, 
and find the probability that ¢#t occurs first. 

56. Random digits, each of which is equally likely to be any of the digits 0 through 9, 
are observed in sequence. 

(a) Find the expected time until a run of 10 distinct values occurs. 
(b) Find the expected time until a run of 5 distinct values occurs. 

57. Let h(x) = PIL, Xj > x} where X1, X2,... are independent random variables 
having distribution function F, and T is independent of the X; and has prob- 
ability mass function P{T = n} = p"(1 — p), n > 0. Show that h(x) satisfies 
Equation (7.53). 

Hint: Start by conditioning on whether T = 0 or T > 0. 
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8.1 Introduction 


In this chapter we will study a class of models in which customers arrive in some 
random manner at a service facility. Upon arrival they are made to wait in queue 
until it is their turn to be served. Once served they are generally assumed to leave 
the system. For such models we will be interested in determining, among other 
things, such quantities as the average number of customers in the system (or in the 
queue) and the average time a customer spends in the system (or spends waiting 
in the queue). 

In Section 8.2 we derive a series of basic queueing identities that are of great use 
in analyzing queueing models. We also introduce three different sets of limiting 
probabilities that correspond to what an arrival sees, what a departure sees, and 
what an outside observer would see. 

In Section 8.3 we deal with queueing systems in which all of the defining prob- 
ability distributions are assumed to be exponential. For instance, the simplest 
such model is to assume that customers arrive in accordance with a Poisson pro- 
cess (and thus the interarrival times are exponentially distributed) and are served 
one at a time by a single server who takes an exponentially distributed length of 
time for each service. These exponential queueing models are special examples of 
continuous-time Markov chains and so can be analyzed as in Chapter 6. However, 
at the cost of a (very) slight amount of repetition we shall not assume that you 
are familiar with the material of Chapter 6, but rather we shall redevelop any 
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needed material. Specifically we shall derive anew (by a heuristic argument) the 
formula for the limiting probabilities. 

In Section 8.4 we consider models in which customers move randomly among 
a network of servers. The model of Section 8.4.1 is an open system in which 
customers are allowed to enter and depart the system, whereas the one studied 
in Section 8.4.2 is closed in the sense that the set of customers in the system is 
constant over time. 

In Section 8.5 we study the model M/G/1, which while assuming Poisson 
arrivals, allows the service distribution to be arbitrary. To analyze this model we 
first introduce in Section 8.5.1 the concept of work, and then use this concept in 
Section 8.5.2 to help analyze this system. In Section 8.5.3 we derive the average 
amount of time that a server remains busy between idle periods. 

In Section 8.6 we consider some variations of the model M/G/1. In particular 
in Section 8.6.1 we suppose that bus loads of customers arrive according to a 
Poisson process and that each bus contains a random number of customers. In 
Section 8.6.2 we suppose that there are two different classes of customers—with 
type 1 customers receiving service priority over type 2. 

In Section 8.6.3 we present an M/G/1 optimization example. We suppose that 
the server goes on break whenever she becomes idle, and then determine, under 
certain cost assumptions, the optimal time for her to return to service. 

In Section 8.7 we consider a model with exponential service times but where the 
interarrival times between customers is allowed to have an arbitrary distribution. 
We analyze this model by use of an appropriately defined Markov chain. We also 
derive the mean length of a busy period and of an idle period for this model. 

In Section 8.8 we consider a single-server system whose arrival process results 
from return visits of a finite number of possible sources. Assuming a general ser- 
vice distribution, we show how a Markov chain can be used to analyze this system. 

In the final section of the chapter we talk about multiserver systems. We 
start with loss systems, in which arrivals finding all servers busy are assumed 
to depart and as such are lost to the system. This leads to the famous result 
known as Erlang’s loss formula, which presents a simple formula for the num- 
ber of busy servers in such a model when the arrival process in Poisson and 
the service distribution is general. We then discuss multiserver systems in which 
queues are allowed. However, except in the case where exponential service times 
are assumed, there are very few explicit formulas for these models. We end by 
presenting an approximation for the average time a customer waits in queue in 
a k-server model that assumes Poisson arrivals but allows for a general service 
distribution. 


8.2 Preliminaries 


In this section we will derive certain identities that are valid in the great majority 
of queueing models. 
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8.2.1 Cost Equations 


Some fundamental quantities of interest for queueing models are 


L, the average number of customers in the system; 

Lg, the average number of customers waiting in queue; 

W, the average amount of time a customer spends in the system; 
Wo, the average amount of time a customer spends waiting in queue. 


A large number of interesting and useful relationships between the preceding 
and other quantities of interest can be obtained by making use of the following 
idea: Imagine that entering customers are forced to pay money (according to some 
rule) to the system. We would then have the following basic cost identity: 


average rate at which the system earns 


=A, X average amount an entering customer pays (8.1) 


where Ag is defined to be average arrival rate of entering customers. That is, if 
N(t) denotes the number of customer arrivals by time #, then 
N(t) 


Aa = lim —— 
t>0o t 


We now present a heuristic proof of Equation (8.1). 


Heuristic Proof of Equation (8.1). Let T bea fixed large number. In two different 
ways, we will compute the average amount of money the system has earned by 
time T. On one hand, this quantity approximately can be obtained by multiplying 
the average rate at which the system earns by the length of time T. On the other 
hand, we can approximately compute it by multiplying the average amount paid 
by an entering customer by the average number of customers entering by time 
T (this latter factor is approximately 4,T). Hence, both sides of Equation (8.1) 
when multiplied by T are approximately equal to the average amount earned 
by T. The result then follows by letting T > oo.* 

By choosing appropriate cost rules, many useful formulas can be obtained 
as special cases of Equation (8.1). For instance, by supposing that each customer 
pays $1 per unit time while in the system, Equation (8.1) yields the so-called 
Little’s formula, 


L=a,W (8.2) 


This follows since, under this cost rule, the rate at which the system earns is just 
the number in the system, and the amount a customer pays is just equal to its 
time in the system. 


* This can be made into a rigorous proof provided we assume that the queueing process is regenerative 
in the sense of Section 7.5. Most models, including all the ones in this chapter, satisfy this condition. 
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Similarly if we suppose that each customer pays $1 per unit time while in queue, 
then Equation (8.1) yields 


Lo =aWo (8.3) 


By supposing the cost rule that each customer pays $1 per unit time while in 
service we obtain from Equation (8.1) that the 


average number of customers in service = AgE[S] (8.4) 


where E[S] is defined as the average amount of time a customer spends in service. 

It should be emphasized that Equations (8.1) through (8.4) are valid for almost 
all queueing models regardless of the arrival process, the number of servers, or 
queue discipline. | 


8.2.2 Steady-State Probabilities 


Let X(t) denote the number of customers in the system at time t and define 
Py n = 0, by 


Re= Jim. P{X(t) =n} 


where we assume the preceding limit exists. In other words, P,, is the limiting 
or long-run probability that there will be exactly 1 customers in the system. It 
is sometimes referred to as the steady-state probability of exactly n customers in 
the system. It also usually turns out that P,, equals the (long-run) proportion of 
time that the system contains exactly m customers. For example, if Pp = 0.3, then 
in the long run, the system will be empty of customers for 30 percent of the time. 
Similarly, P; = 0.2 would imply that for 20 percent of the time the system would 
contain exactly one customer.* 

Two other sets of limiting probabilities are {a,, > 0} and {d,, > 0}, where 


an = proportion of customers that find 2 
in the system when they arrive, and 


d, = proportion of customers leaving behind x 
in the system when they depart 


That is, P,, is the proportion of time during which there are 7 in the system; a, 
is the proportion of arrivals that find n; and d, is the proportion of departures 
that leave behind n. That these quantities need not always be equal is illustrated 
by the following example. 


* A sufficient condition for the validity of the dual interpretation of P,, is that the queueing process 
be regenerative. 
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Example 8.1 Consider a queueing model in which all customers have service 
times equal to 1, and where the times between successive customers are always 
greater than 1 (for instance, the interarrival times could be uniformly distributed 
over (1, 2)). Hence, as every arrival finds the system empty and every departure 
leaves it empty, we have 


49 = do = 1 
However, 
Po A~l 
as the system is not always empty of customers. a 


It was, however, no accident that a, equaled d,, in the previous example. That 
arrivals and departures always see the same number of customers is always true 
as is shown in the next proposition. 


Proposition 8.1 In any system in which customers arrive and depart one at a 
time 


the rate at which arrivals find 7 = the rate at which departures leave 
and 
an = dy 


Proof. An arrival will see 1 in the system whenever the number in the system goes 
from 1 ton + 1; similarly, a departure will leave behind whenever the number 
in the system goes from # + 1 to n. Now in any interval of time T the number of 
transitions from 1 to n + 1 must equal to within 1 the number from 2 + 1 to n. 
(Between any two transitions from 7 to n + 1, there must be one from 1 + 1 
to n, and conversely.) Hence, the rate of transitions from 7 to 1 + 1 equals the 
rate from 2 + 1 to n; or, equivalently, the rate at which arrivals find 7 equals the 
rate at which departures leave n. Now a,, the proportion of arrivals finding 1, 
can be expressed as 


the rate at which arrivals find n 


an = 
" overall arrival rate 
Similarly, 
d the rate at which departures leave n 
n = 


overall departure rate 


Thus, if the overall arrival rate is equal to the overall departure rate, then the 
preceding shows that a, = d,. On the other hand, if the overall arrival rate 
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exceeds the overall departure rate, then the queue size will go to infinity, implying 
that a, = d, = 0. | 


Hence, on the average, arrivals and departures always see the same number 
of customers. However, as Example 8.1 illustrates, they do not, in general, see 
time averages. One important exception where they do is in the case of Poisson 
arrivals. 


Proposition 8.2 Poisson arrivals always see time averages. In particular, for 
Poisson arrivals, 


Py = an 


To understand why Poisson arrivals always see time averages, consider an arbi- 
trary Poisson arrival. If we knew that it arrived at time t, then the conditional 
distribution of what it sees upon arrival is the same as the unconditional distri- 
bution of the system state at time ¢t. For knowing that an arrival occurs at time t 
gives us no information about what occurred prior to t. (Since the Poisson process 
has independent increments, knowing that an event occurred at time ¢ does not 
affect the distribution of what occurred prior to t.) Hence, an arrival would just 
see the system according to the limiting probabilities. 

Contrast the foregoing with the situation of Example 8.1 where knowing that 
an arrival occurred at time ¢ tells us a great deal about the past; in particular it 
tells us that there have been no arrivals in (t — 1, t). Thus, in this case, we cannot 
conclude that the distribution of what an arrival at time t observes is the same as 
the distribution of the system state at time f¢. 

For a second argument as to why Poisson arrivals see time averages, note that 
the total time the system is in state 7 by time T is (roughly) P,,T. Hence, as Poisson 
arrivals always arrive at rate 4 no matter what the system state, it follows that the 
number of arrivals in [0, T] that find the system in state 7 is (roughly) AP,,T. In 
the long run, therefore, the rate at which arrivals find the system in state n is AP, 
and, as A is the overall arrival rate, it follows that AP,,/A = P, is the proportion 
of arrivals that find the system in state n. 

The result that Poisson arrivals see time averages is called the PASTA principle. 


8.3 Exponential Models 


8.3.1 A Single-Server Exponential Queueing System 


Suppose that customers arrive at a single-server service station in accordance with 
a Poisson process having rate A. That is, the times between successive arrivals are 
independent exponential random variables having mean 1/4. Each customer, 
upon arrival, goes directly into service if the server is free and, if not, the cus- 
tomer joins the queue. When the server finishes serving a customer, the customer 
leaves the system, and the next customer in line, if there is any, enters service. 
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The successive service times are assumed to be independent exponential random 
variables having mean 1/1. 

The preceding is called the M/M/1 queue. The two Ms refer to the fact that both 
the interarrival and the service distributions are exponential (and thus memory- 
less, or Markovian), and the 1 to the fact that there is a single server. To analyze 
it, we shall begin by determining the limiting probabilities P,,, for 7 = 0,1,.... 
To do so, think along the following lines. Suppose that we have an infinite num- 
ber of rooms numbered 0, 1, 2,..., and suppose that we instruct an individual to 
enter room 7 whenever there are 1 customers in the system. That is, he would be 
in room 2 whenever there are two customers in the system; and if another were 
to arrive, then he would leave room 2 and enter room 3. Similarly, if a service 
would take place he would leave room 2 and enter room 1 (as there would now 
be only one customer in the system). 

Now suppose that in the long run our individual is seen to have entered room 1 
at the rate of ten times an hour. Then at what rate must he have left room 1? 
Clearly, at this same rate of ten times an hour. For the total number of times that 
he enters room 1 must be equal to (or one greater than) the total number of times 
he leaves room 1. This sort of argument thus yields the general principle that will 
enable us to determine the state probabilities. Namely, for each 1 > 0, the rate at 
which the process enters state n equals the rate at which it leaves state n. Let us 
now determine these rates. Consider first state 0. When in state 0 the process can 
leave only by an arrival as clearly there cannot be a departure when the system 
is empty. Since the arrival rate is 2 and the proportion of time that the process 
is in state 0 is Po, it follows that the rate at which the process leaves state 0 is 
APo. On the other hand, state 0 can only be reached from state 1 via a departure. 
That is, if there is a single customer in the system and he completes service, then 
the system becomes empty. Since the service rate is « and the proportion of time 
that the system has exactly one customer is P}, it follows that the rate at which 
the process enters state 0 is 4P1. 

Hence, from our rate-equality principle we get our first equation, 


APo = bP 


Now consider state 1. The process can leave this state either by an arrival (which 
occurs at rate A) or a departure (which occurs at rate jz). Hence, when in state 1, 
the process will leave this state at a rate of A + w.* Since the proportion of 
time the process is in state 1 is P;, the rate at which the process leaves state 1 
is (A + w)P;. On the other hand, state 1 can be entered either from state 0 via 
an arrival or from state 2 via a departure. Hence, the rate at which the process 
enters state 1 is APo + uP. Because the reasoning for other states is similar, we 


* Tf one event occurs at a rate A and another occurs at rate 2, then the total rate at which either event 
occurs is A + 44. Suppose one man earns $2 per hour and another earns $3 per hour; then together 
they clearly earn $5 per hour. 
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obtain the following set of equations: 


State Rate at which the process leaves = rate at which it enters 
0 APo = wPy 
n,n 2 1 (A+ b)Pn = APn-1 + MPny1 (8.5) 


Equations (8.5), which balance the rate at which the process enters each state 
with the rate at which it leaves that state are known as balance equations. 
In order to solve Equations (8.5), we rewrite them to obtain 


Po = Po, 
Xx 
P; = —Po, 
ML 
2 
Xx Xr Xr Xr 
P)= *P) + (Pi-2Po) = =P: = (*) Po, 
UL lL UL UL 
3 
Xx Xr Xr Xr 
P3 = —P) + (P: 7 Ps) =—Py,= (=) Po, 
UL LL UL lL 
4 
Xr Xr Xr 
Pa= *P3+(P)— =P) = Ps = (+) Po, 
lL UL UL 
1 a a n+1 
Prt ==Py + (Py ~ =Py-1) = Px = (=) Po 
lL lL 


To determine Po we use the fact that the P,, must sum to 1, and thus 
(oe) ee) n 
Xr Po 
1= P,= ae Po = ———— 
ae ee 


or 


Pa (=) (1 ‘ =); n> (8.6) 
Lu lL 


8.3 Exponential Models 505 


Notice that for the preceding equations to make sense, it is necessary for A/j to 
be less than 1. For otherwise }7?° 9(4/)” would be infinite and all the P,, would 
be 0. Hence, we shall assume that A/y < 1. Note that it is quite intuitive that 
there would be no limiting probabilities if 4 > 1. For suppose that A > yz. Since 
customers arrive at a Poisson rate A, it follows that the expected total number 
of arrivals by time ¢t is At. On the other hand, what is the expected number of 
customers served by time t? If there were always customers present, then the 
number of customers served would be a Poisson process having rate since 
the time between successive services would be independent exponentials having 
mean 1/. Hence, the expected number of customers served by time ¢ is no 
greater than yt; and, therefore, the expected number in the system at time f is at 
least 


At-—pt=(A- wt 


Now, if A > jy, then the preceding number goes to infinity as t becomes large. That 
is, 4/j > 1, the queue size increases without limit and there will be no limiting 
probabilities. Note also that the condition A/j < 1 is equivalent to the condition 
that the mean service time be less than the mean time between successive arrivals. 
This is the general condition that must be satisfied for limited probabilities to 
exist in most single-server queueing systems. 


Remarks 


(i) In solving the balance equations for the M/M/1 queue, we obtained as an interme- 
diate step the set of equations 


APn= Papi, 2 > 0 


These equations could have been directly argued from the general queueing result 
(shown in Proposition 8.1) that the rate at which arrivals find 1 in the system— 
namely 4P,,—is equal to the rate at which departures leave behind »—namely, 
MPnyt- 

(ii) We can also prove that P, = (A/m)"(1 — A/) by using a queueing cost identity. 
Suppose that, for a fixed m > 0, whenever there are at least 1 customers in the 
system the nth oldest customer (with age measured from when the customer arrived) 
pays 1 per unit time. Letting X be the steady state number of customers in the 
system, because the system earns 1 per unit time whenever X is at least n, it follows 
that 


average rate at which the system earns = P{X > n} 


Also, because a customer who finds fewer than 7 — 1 in the system when it arrives 
will pay 0, while an arrival who finds at least 2 — 1 in the system will pay 1 per unit 
time for an exponentially distributed time with rate ju, 


1 
average amount a customer pays = —P{X >n- 1} 
be 
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Therefore, the queueing cost identity yields 
P{X 2 n} = (A/w)P{X Bn 1}, n>0O 
Iterating this gives 


P{X Bn} = (A/m)P{X 2 n—- 1} 


(A/my?P{X > n—-2} 


= (A/pu)"P{X > 0} 
= (A/p)” 
Therefore, 
P{X =n} = P{X Sn} —- P(X Snt I= (A/w)"1—A/p) a 


Now let us attempt to express the quantities L,Lo, W, and Wo in terms of 
the limiting probabilities P,,. Since P,, is the long-run probability that the system 
contains exactly 1 customers, the average number of customers in the system 
clearly is given by 


_ (8.7) 


where the last equation followed upon application of the algebraic identity 


= x 
> nx” — Game 
n=0 


The quantities W,Wo, and Lo now can be obtained with the help of 
Equations (8.2) and (8.3). That is, since 4g = A, we have from Equation (8.7) 
that 
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Wo = W- ES] 
1 
ra ees 
LL 
ee 
(ih — A)? 
2, 
= a (8.8) 
(we — A) 


Example 8.2 Suppose that customers arrive at a Poisson rate of one per every 
12 minutes, and that the service time is exponential at a rate of one service per 
8 minutes. What are L and W? 


Solution: Since A = b> w= Be we have 
bao. * ay S94 


Hence, the average number of customers in the system is 2, and the average 
time a customer spends in the system is 24 minutes. 

Now suppose that the arrival rate increases 20 percent to A = jy. What is 
the corresponding change in L and W? Again using Equations (8.8), we get 


L=4,  W=40 


Hence, an increase of 20 percent in the arrival rate doubled the average number 
of customers in the system. 
To understand this better, write Equations (8.8) as 


d 
eee ae 

1—A/p 

1/u 
Wei 


From these equations we can see that when A/w is near 1, a slight increase in 
A/ will lead to a large increase in L and W. a 


A Technical Remark We have used the fact that if one event occurs at an expo- 
nential rate A, and another independent event at an exponential rate jz, then 
together they occur at an exponential rate 4 + yu. To check this formally, let T; 
be the time at which the first event occurs, and T> the time at which the second 
event occurs. Then 


P{T; <t}=1—e™, 
P{T, <t}=1—e" 


508 Queueing Theory 


Now if we are interested in the time until either T,; or T> occurs, then we are 
interested in T = min(T1, T2). Now, 


P{T <t}=1—-P{T>f} 
= 1 — P{min(T;, T>) > t} 


However, min(T;, T)) > tif and only if both T; and T> are greater than t; hence, 


P{T <t}=1—-—P{T; >t,T> >t} 
=1—-—P{T, > t}P{T2 > t} 


=1—e%Me HH 


=e e  Atwt 


Thus, T has an exponential distribution with rate A + jz, and we are justified in 
adding the rates. a 


Given that an M/M/1 steady-state customer—that is, a customer who arrives 
after the system has been in operation a long time—spends a total of t time units 
in the system, let us determine the conditional distribution of N, the number of 
others that were present when that customer arrived. That is, letting W* be the 
amount of time a customer spends in the system, we will find P{N = n|W* = ¢}. 
Now, 


7 fn,w*(n, t) 
fy) 


= PIN = n}fw«n (ln) 
fv® 


P(N =n|W* =} 


where fw+|n(t|7) is the conditional density of W* given that N = n, and fws(t) 
is the unconditional density of W*. Now, given that N = 2, the time that the 
customer spends in the system is distributed as the sum of » + 1 independent 
exponential random variables with a common rate jp, implying that the con- 
ditional distribution of W* given that N = x is the gamma distribution with 
parameters 7 + 1 and yw. Therefore, with C = 1/fww«(), 


P{N = n|W* =1t} = CP{N = jon 0 


= CA/u)"(1 — a/pye ee EO” 


(At)” 
n! 


(by PASTA) 


=K 
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where K = C(1 — A/j) we does not depend on 7. Summing over 7 yields 


1=) P(N=n|T =1}=K))—_ =Ke™ 
n=0 n=0 


Thus, K = e~*“, showing that 


At)” 
P{N =n|W* = t} = gs 
n! 
Therefore, the conditional distribution of the number seen by an arrival who 
spends a total of ¢ time units in the system is the Poisson distribution with mean At. 
In addition, as a by-product of our analysis, we have 


fw«(t) = 1/C 
= zi —A/p)we 
= (uw Aye” 


In other words, W*, the amount of time a customer spends in the system, is an 
exponential random variable with rate uw — A. (As a check, we note that E[W*] = 
1/(u — 4), which checks with Equation (8.8) since W = E[W*].) 


Remark Another argument as to why W* is exponential with rate yz — A is as 
follows. If we let N denote the number of customers in the system as seen by 
an arrival, then this arrival will spend N + 1 service times in the system before 
departing. Now, 


PIN+1=j}=P{N=j-=Q/py 'd-ajw, j>1 


In words, the number of services that have to be completed before the arrival 
departs is a geometric random variable with parameter 1 — A/y. Therefore, after 
each service completion our customer will be the one departing with probability 
1 —A/. Thus, no matter how long the customer has already spent in the system, 
the probability he will depart in the next h time units is zh + o(h), the probability 
that a service ends in that time, multiplied by 1 — A/. That is, the customer will 
depart in the next / time units with probability (4 — A)h + o(h), which says that 
the hazard rate function of W* is the constant w — A. But only the exponential 
has a constant hazard rate, and so we can conclude that W* is exponential with 
rate w—A. 


Our next example illustrates the inspection paradox. 


Example 8.3 For an M/M/1 queue in steady state, what is the probability that 
the next arrival finds 7 in the system? 
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Solution: Although it might initially seem, by the PASTA principle, that this 
probability should just be (A/u)"(1 — 4/4), we must be careful. Because if t 
is the current time, then the time from ¢ until the next arrival is exponentially 
distributed with rate 4, and is independent of the time from ¢ since the last 
arrival, which (in the limit, as ¢ goes to infinity) is also exponential with rate A. 
Thus, although the times between successive arrivals of a Poisson process are 
exponential with rate A, the time between the previous arrival before t and the 
first arrival after t is distributed as the sum of two independent exponentials. 
(This is an illustration of the inspection paradox, which results because the 
length of an interarrival interval that contains a specified time tends to be 
longer than an ordinary interarrival interval—see Section 7.7.) 

Let N, denote the number found by the next arrival, and let X be the number 
currently in the system. Conditioning on X yields 


P{Ng = 1} = >) P{Ng = n|X = R}P{X = k} 
k=0 
= DOPING = nlX = R}Q/W)E — A/w) 
k=0 


= DE PING = 1X = k}A/p)S(L — A/u) 
k=n 


=) PIN, = |X =n + i}A/p)"*( —d/p) 
i=0 


Now, for 7 > 0, given there are currently + i in the system, the next arrival 
will find 7 if we have i services before an arrival and then an arrival before 
the next service completion. By the lack of memory property of exponential 
interarrival random variables, this gives 


a 
At ww 


i 

: bb 
P{Ng =n|X = = {| —— 
{Na = n| n+ 1} (+) 


n>0O 


Consequently, for 1 > 0, 


7 7 ioe) mn 1 a nN nti 
PiNe=ni = (4) (=) oe 


i=0 

i. te eS 
= al 
(A/p)"( ear e 3 eer 


= (A/p)"t1d —A/n) 
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On the other hand, the probability that the next arrival will find the system 
empty, when there are currently i in the system, is the probability that there 
are 7 services before the next arrival. Therefore, P{N, = 0|X = 7} = Gaa 
giving 


oo i a 1 
P{N, = 0} = (4) (<) (1—2/p) 


i=0 


lee) x i 
= (1 —A/u) (=) 
a7 A+ 


= (1+4/n)1—4/p) 


As a check, note that 


S> PINa =n} = (1- ind] + A/m + Yow" | 
n=0 


n=1 


= (1-A/n) S0Q/p) 


i=0 
=1 


Note that P{N, = 0} is larger than Po = 1 — 4/4, showing that the next 
arrival is more likely to find an empty system than is an average arrival, and 
thus illustrating the inspection paradox that when the next customer arrives 
the elapsed time since the previous arrival is distributed as the sum of two 
independent exponentials with rate A. Also, we might expect because of the 
inspection paradox that E[N,] is less than L, the average number of customers 
seen by an arrival. That this is indeed the case is seen from 


E[Ng] = Snead —A/p) = s 2b ] 
a 


n=1 


8.3.2 A Single-Server Exponential Queueing System 
Having Finite Capacity 


In the previous model, we assumed that there was no limit on the number of 
customers that could be in the system at the same time. However, in reality there 
is always a finite system capacity N, in the sense that there can be no more than 
N customers in the system at any time. By this, we mean that if an arriving 
customer finds that there are already N customers present, then he does not enter 
the system. 


512 Queueing Theory 


As before, we let P,, 0 <n < N, denote the limiting probability that there are 
n customers in the system. The rate-equality principle yields the following set of 
balance equations: 


State Rate at which the process leaves = rate at which it enters 
0 APo = uP, 
l<n<N-1 (A+ b)Pn = APa-1 + MP1 
N UPN = APN-1 


The argument for state 0 is exactly as before. Namely, when in state 0, the 
process will leave only via an arrival (which occurs at rate 4) and hence the rate 
at which the process leaves state 0 is APo. On the other hand, the process can 
enter state 0 only from state 1 via a departure; hence, the rate at which the process 
enters state 0 is 4P,. The equation for state n, where 1 < 1 < N, is the same as 
before. The equation for state N is different because now state N can only be left 
via a departure since an arriving customer will not enter the system when it is in 
state N; also, state N can now only be entered from state N — 1 (as there is no 
longer a state N + 1) via an arrival. 

We could now either solve the balance equations exactly as we did for the 
infinite capacity model, or we could save a few lines by directly using the result 
that the rate at which departures leave behind 1 — 1 is equal to the rate at which 
arrivals find n — 1. Invoking this result yields 


eS WTA GN 


giving 


2 n 

xr xr Xr 

P= *Py-1 = (=) Pra=:=(2) Pos -#= TaN 
2 LL LM 


By using the fact that )~™_) P,, = 1 we obtain 


=n) 


n=0 ie 
N+1 
ae ae] 
1—A/p 
or 
(1 —A/p) 


> T= OAT 


8.3 Exponential Models 513 


and hence from the preceding we obtain 


— O/p)" A = A/M) a! 
Pn = TG Na” n=0,1,...,N 


Note that in this case, there is no need to impose the condition that A/y < 1. The 
queue size is, by definition, bounded so there is no possibility of its increasing 
indefinitely. 

As before, L may be expressed in terms of P,, to yield 


— G—A/p) 
~ 1=@/p)N41 4 ye G ) 
which after some algebra yields 


Al + NA/W)Nt — (N + 1DA/w)N] 


L= 
(u— A) — Q/m)Nt1) 


In deriving W, the expected amount of time a customer spends in the system, 
we must be a little careful about what we mean by a customer. Specifically, are 
we including those “customers” who arrive to find the system full and thus do 
not spend any time in the system? Or, do we just want the expected time spent 
in the system by a customer who actually entered the system? The two questions 
lead, of course, to different answers. In the first case, we have 4, = A; whereas 
in the second case, since the fraction of arrivals that actually enter the system 
is 1 — Py, it follows that Ag = A(1 — Pn). Once it is clear what we mean by a 
customer, W can be obtained from 


w= 
da 


Example 8.4 Suppose that it costs cu dollars per hour to provide service at a 
rate «4. Suppose also that we incur a gross profit of A dollars for each customer 
served. If the system has a capacity N, what service rate 4 maximizes our total 
profit? 


Solution: To solve this, suppose that we use rate w. Let us determine the 
amount of money coming in per hour and subtract from this the amount going 
out each hour. This will give us our profit per hour, and we can choose ju so 
as to maximize this. 
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Now, potential customers arrive at a rate 4. However, a certain proportion 
of them do not join the system—namely, those who arrive when there are N 
customers already in the system. Hence, since Px is the proportion of time 
that the system is full, it follows that entering customers arrive at a rate of 
24(1 — Pn). Since each customer pays $A, it follows that money comes in at 
an hourly rate of A(1 — Pn)A and since it goes out at an hourly rate of cy, it 
follows that our total profit per hour is given by 


profit per hour = A(1 — Py)A — cue 


N 
= ial (A/py yA — | 
1 — (A/p)N+1 
_ AAU = /W)N] 
~ 1-A/p)X*4 


For instance if N = 2,4 = 1,A = 10,c = 1, then 


10[1 — (1/)7] 


profit per hour = 


1-(1/p)3 
~ 10GE =a). > 


In order to maximize profit we differentiate to obtain 


Qu = ser 4) 


(a? 5 1)- : 


Leer per hour] = 10 
du 


The value of yw that maximizes our profit now can be obtained by equating to 
zero and solving numerically. a 


We say that a queueing system alternates between idle periods when there 
are no customers in the system and busy periods in which there is at least one 
customer in the system. We will end this section by determining the expected 
value and variance of the number of lost customers in a busy period, where a 
customer is said to be lost if it arrives when the system is at capacity. 

To determine the preceding quantities, let L,, denote the number of lost cus- 
tomers in a busy period of a finite capacity M/M/1 queue in which an arrival 
finding n others does not join the system. To derive an expression for E[L,] and 
Var(L,), suppose a busy period has just begun and condition on whether the next 
event is an arrival or a departure. Now, with 


I 0, if service completion occurs before next arrival 
1, if arrival before service completion 
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note that if J = 0 then the busy period will end before the next arrival and so 
there will be no lost customers in that busy period. As a result 


E[L,\I = 0] = Var(Ly|I = 0) = 0 


Now suppose that the next arrival appears before the end of the first service 
time, and so J = 1. Then if 2 = 1 that arrival will be lost and it will be as if the 
busy period were just beginning anew at that point, yielding that the conditional 
number of lost customers has the same distribution as does 1 + L;. On the other 
hand, if 2 > 1 then at the moment of the arrival there will be two customers in 
the system, the one in service and the “second customer” who has just arrived. 
Because the distribution of the number of lost customers in a busy period does 
not depend on the order in which customers are served, let us suppose that the 
“second customer” is put aside and does not receive any service until it is the only 
remaining customer. Then it is easy to see that the number of lost customers until 
that “second customer” begins service has the same distribution as the number 
of lost customers in a busy period when the system capacity is 7 — 1. Moreover, 
the additional number of lost customers in the busy period starting when ser- 
vice begins on the “second customer” has the distribution of the number of lost 
customers in a busy period when the system capacity is m. Consequently, given 
I =1, L, has the distribution of the sum of two independent random variables: 
one of which is distributed as L,_1 and represents the number of lost customers 
before there is again only a single customer in the system, and the other which 
is distributed as L, and represents the additional number of lost customers from 
the moment when there is again a single customer until the busy period ends. 
Hence, 


1+ E[Lj], ifn=1 
E(Lyll = 1] = . 
E{Ly,-1] + E[Ly,], ifn>1 
and 
Var(L}), ifn=1 
Var(Ly|f = 1) = j 
Var(Ly_1) + Var(L,), ifn>1 
Letting 


My = E[Ly] and Un = Var(Ln) 
then, with 9 = 1, vo = 0, the preceding equations can be rewritten as 


Var(Ln|D = Iy_1 + Un) (8.10) 
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Using that PUI = 1) = P(arrival before service) = ean = 1—Pd = 0), we 
obtain upon taking expectations of both sides of Equation (8.9) that 


My = —— |My + My 
n pe n—1] 


or 


Xx 
My = —My-1 


Starting with m1 = d/,, this yields the result 


Mn = (A/m)" 


To determine v,, we use the conditional variance formula. Using Equations (8.9) 
and (8.10) it gives 


Un = Un + Vn—1EU + (ttn + my_1)’Var(D 


peed, Mee ern CV TSH) camel oe 
A+ bb A+ UA+ UL 


Ab 
(A + py)? 


i (Un + V+ A/ pr2(2 + 1) 
= Un + Vy = 
eh 1 HM & 


a inn 
= Un + Un-1) + A/p)” 
Ti” n—1) (A/() 


Hence, 
Un = My + (A+ w)a/p)yr) 


or, with p =A/p 


Un = PUn-1 + pr"! + p™ 
Therefore, 
v=ptp’, 


v2 = p> +2p° + p’, 
v3 = p> +2p* +2p° + p, 
v4 =p? +2p° + 2p® + 2p’ + p® 
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and, in general, 


2n-1 ; 
tn =p" +2 >> p+ p 
j=n+1 


8.3.3 Birth and Death Queueing Models 


An exponential queueing system in which the arrival rates and the departure rates 
depend on the number of customers in the system is known as a birth and death 
queueing model. Let 4,, denote the arrival rate and let jz, denote the departure 
rate when there are 1 customers in the system. Loosely speaking, when there are 
n customers in the system then the time until the next arrival is exponential with 
rate 1,, and is independent of the time of the next departure, which is exponential 
with rate 4. Equivalently, and more formally, whenever there are n customers 
in the system, the time until either the next arrival or the next departure occurs is 
an exponential random variable with rate 4,, + fn and, independent of how long 


it takes for this occurrence, it will be an arrival with probability > on . We now 
nN n 


give some examples of birth and death queues. 


(a) The M/M/1 Queueing System 
Because the arrival rate is always 4, and the departure rate is 4 when the system is 
nonempty, the M/M/1 is a birth and death model with 


An=a, n=O 
HMn=h, nZl 


(b) The M/M/1 Queueing System with Balking 
Consider the M/M/1 system but now suppose that a customer that finds 7 others 
in the system upon its arrival will only join the system with probability a,. (That is, 
with probability 1— «a, it balks at joining the system.) Then this system is a birth and 
death model with 


An=Adn, n>O 
Un =H, noi 
The M/M/1 with finite capacity N is the special case where 


1, ifn<N 
0, ifn>N 


(c) The M/M/k Queueing System 
Consider a k server system in which customers arrive according to a Poisson process 
with rate A. An arriving customer immediately enters service if any of the k servers are 
free. If all k servers are busy, then the arrival joins the queue. When a server completes 
a service the customer served departs the system and if there are any customers in 
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queue then the one who has been waiting longest enters service with that server. All 
service times are exponential random variables with rate 4. Because customers are 
always arriving at rate A, 


An=aA, n=O 


Now, when there are 2 < k customers in the system then each customer will be 
receiving service and so the time until a departure will be the minimum of n indepen- 
dent exponentials each having rate jz, and so will be exponential with rate nu. On 
the other hand if there are m > k in the system then only k of the will be in service, 
and so the departure rate in this case is ku. Hence, the M/M/k is a birth and death 
queueing model with arrival rates 


An=aA, n=O 


a 
To analyze the general birth and death queueing model, let P,, denote the long- 
run proportion of time there are 7 in the system. Then, either as a consequence 


of the balance equations given by 


state rate at which process leaves = rate at which process enters 


0 AoPo = wiPi 
1 


nN 
n (An + Un)Pn = An-1Ph at Mns1Pn41 


W 


or by directly using the result that the rate at which arrivals find » in the system 
is equal to the rate at which departures leave behind , we obtain 


AnPn = Mn+1Pn+1, n>0 


or, equivalently, that 


Xx 
Phot = ——Pr, n>0 
Mn+ 
Thus, 
Po = Po, 
ho 
P, = —Po, 
i 
MM A109 
Py =—Py= Pos 


8.3 Exponential Models 519 


Xr A2A1A 
P; = 22 p, = 224140 p 
13 M321 
and, in general 
AoA: Ane 
pe ee n>1 
MAM ++ Mn 


Using that )°°° 9 Pn = 1 shows that 


eo) 
AoAq Ane 
1=ro[1+) OA1 =| 


— Mil2-**En 
wa1 
Hence, 
P 1 
0 — 
[o,e) AOAL An—1 
1+ Qin=t Mp2 ln 
and 
OAL An=1 
P,, H1b2° Un n> 1 


= 00. AgMaAn—d? 
1+ Qin=t M1 M2-Mn 


The necessary and sufficient conditions for the long-run probabilities to exist is 
that the denominator in the preceding is finite. That is, we need have that 


Example 8.5 For the M/M/k system 


a/p)” ‘ 
Liha ee Ge4 Gin 2) ; ifn<k 
H1M2°** bn ee ifn>k 


Hence, using that ne: = (A/ku)" ke /k! we see that 


1 
0= ’ 
1+ Dh A/p)"/n! + eo, a/R uythe/k! 
Py = PoA/m)”"/n!, if n<k 
Pn=Po(a/ku)"ke/kt, if n>k 


P 
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It follows from the preceding that the condition needed for the limiting prob- 
abilities to exist is A < kw. Because ky is the service rate when all servers are 
busy, the preceding is just the intuitive condition that for limiting probabilities to 
exist the service rate needs to be larger than the arrival rate when there are many 
customers in the system. | 


To determine W, the average time that a customer spends in the system, for the 
birth and death queueing system, we employ the fundamental queueing identity 
L=,A,W. Because L is the average number of customers in the system, 


Also, because the arrival rate when there are 7 in the system is 4, and the pro- 
portion of time in which there are 1 in the system is P,, we see that the average 
arrival rate of customers is 


lo) 
er AaP 
n=0 


Consequently, 
W= din=0 MP 
verso AnPn 


Now consider a, equal to the proportion of arrivals that find 1 in the system. 
Since arrivals are at rate A, whenever there are 7 in system it follows that the rate 
at which arrivals find 7 is A,P,. Hence, in a large time T approximately 4,,P,T of 
the approximately 4,T arrivals will encounter n. Letting T go to infinity shows 
that the long-run proportion of arrivals finding 7 in the system is 


= ‘i, 


an 


Let us now consider the average length of a busy period, where we say that the 
system alternates between idle periods when there are no customers in the system 
and busy periods in which there is at least one customer in the system. Now, an 
idle period begins when the system is empty and ends when the next customer 
arrives. Because the arrival rate when the system is empty is Ag, it thus follows 
that, independent of all that previously occurred, the length of an idle period is 
exponential with rate Ao. Because a busy period always begins when there is one 
in the system and ends when the system is empty, it is easy to see that the lengths 
of successive busy periods are independent and identically distributed. Let J; and 
B; denote, respectively, the lengths of the j“” idle and the j“” busy period, j > 1. 
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Now, in the first }°"_; (J; + Bj) time units the system will be empty for a time 
Yj-1 Jj. Consequently, Po, the long-run proportion of time in which the system 
is empty, can be expressed as 


Po = long-run proportion of time empty 
= lim 
noo [4 +...+1,+ By +...+B, 
ae (i +...+In)/n 
noo (Ty +... +]n)/n+ (Bi t+...+ By) /n 
El] 


= aA (8.11) 


where I and B represent, respectively, the lengths of an idle and of a busy period, 
and where the final equality follows from the strong law of large numbers. Hence, 
using that E[I] = 1/9, we see that 


1 
Pps. = 22 
Oe 1 AOELBI 
or, 
ee sr 
E[B] = 8.12 
[Bl = =, (8.12) 


For instance, in the M/M/1 queue, this yields E[B] = atta = ae 

Another quantity of interest is T;,, the amount of time during a busy period that 
there are 1 in the system. To determine its mean, note that E[T,,] is the average 
amount of time there are 1 in the system in intervals between successive busy 
periods. Because the average time between successive busy periods is E[B] + E[I], 
it follows that 


P,, = long-run proportion of time there are 1 in system 


EIT al 
El] + E[B] 
_ E[Tnl Po 
ioe ii from (8.11) 
Hence, 
E[ C= Py a AqertAn—1 
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As a check, note that 
(oe) 
Beey ay 
n=1 


and thus, 


ee) 


2 _ 1=Po 
hoPo "Ao Po 


E(B] = ) | E[Tn] 
n=1 


which is in agreement with (8.12). 

For the M/M/1 system, the preceding gives E[T,] = 2”~1/p”. 

Whereas in exponential birth and death queueing models the state of the system 
is just the number of customers in the system, there are other exponential models 
in which a more detailed state space is needed. To illustrate, we consider some 
examples. 


8.3.4 A Shoe Shine Shop 


Consider a shoe shine shop consisting of two chairs. Suppose that an entering 
customer first will go to chair 1. When his work is completed in chair 1, he will 
go either to chair 2 if that chair is empty or else wait in chair 1 until chair 2 
becomes empty. Suppose that a potential customer will enter this shop as long 
as chair 1 is empty. (Thus, for instance, a potential customer might enter even if 
there is a customer in chair 2.) 

If we suppose that potential customers arrive in accordance with a Poisson 
process at rate A, and that the service times for the two chairs are independent 
and have respective exponential rates of jz, and jz, then 


(a) what proportion of potential customers enters the system? 
(b) what is the mean number of customers in the system? 
(c) what is the average amount of time that an entering customer spends in the system? 


To begin we must first decide upon an appropriate state space. It is clear that 
the state of the system must include more information than merely the number 
of customers in the system. For instance, it would not be enough to specify that 
there is one customer in the system as we would also have to know which chair 
he was in. Further, if we only know that there are two customers in the system, 
then we would not know if the man in chair 1 is still being served or if he is 
just waiting for the person in chair 2 to finish. To account for these points, the 
following state space, consisting of the five states (0,0), (1,0), (0,1), (1,1), and 
(b, 1), will be used. The states have the following interpretation: 


State Interpretation 


(0, 0) There are no customers in the system. 
(1, 0) There is one customer in the system, and he is in chair 1. 


8.3 Exponential Models 523 


(0, 1) There is one customer in the system, and he is in chair 2. 


(1,1) There are two customers in the system, and both are 
presently being served. 

(b, 1) There are two customers in the system, but the customer 
in the first chair has completed his work in that chair and 
is waiting for the second chair to become free. 


It should be noted that when the system is in state (b, 1), the person in chair 1, 
though not being served, is nevertheless “blocking” potential arrivals from enter- 
ing the system. 

As a prelude to writing down the balance equations, it is usually worthwhile 
to make a transition diagram. This is done by first drawing a circle for each state 
and then drawing an arrow labeled by the rate at which the process goes from one 
state to another. The transition diagram for this model is shown in Figure 8.1. 
The explanation for the diagram is as follows: The arrow from state (0,0) to 
state (1,0) that is labeled 4 means that when the process is in state (0, 0), that is, 
when the system is empty, then it goes to state (1,0) at a rate A, that is, via an 
arrival. The arrow from (0, 1) to (1, 1) is similarly explained. 

When the process is in state (1, 0), it will go to state (0,1) when the customer 
in chair 1 is finished and this occurs at a rate 441; hence the arrow from (1,0) to 
(0,1) labeled 44. The arrow from (1, 1) to (8, 1) is similarly explained. 

When in state (b,1) the process will go to state (0,1) when the customer in 
chair 2 completes his service (which occurs at rate 422); hence the arrow from 
(b, 1) to (0, 1) labeled zz. Also, when in state (1,1) the process will go to state 
(1,0) when the man in chair 2 finishes; hence the arrow from (1,1) to (1,0) 
labeled jx2. Finally, if the process is in state (0,1), then it will go to state (0, 0) 
when the man in chair 2 completes his service; hence the arrow from (0,1) to 
(0, 0) labeled p22. 

Because there are no other possible transitions, this completes the transition 
diagram. 


Figure 8.1 A transition diagram. 
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To write the balance equations we equate the sum of the arrows (multiplied 
by the probability of the states where they originate) coming into a state with the 
sum of the arrows (multiplied by the probability of the state) going out of that 
state. This gives 


State Rate that the process leaves = rate that it enters 
(0, 0) APoo = H2Po1 

(1,0) MiPi9 = APoo + H2Pi1 
(0,1) (A + #2)Po1 = H1P10 + M2Po1 
(1,1) (M1 + w2)P11 = APo1 

(b, 1) M2Pp, = MiP it 


These along with the equation 
Poo + Pio + Por + Pir + Py = 1 


may be solved to determine the limiting probabilities. Though it is easy to solve 
the preceding equations, the resulting solutions are quite involved and hence will 
not be explicitly presented. However, it is easy to answer our questions in terms 
of these limiting probabilities. First, since a potential customer will enter the 
system when the state is either (0, 0) or (0, 1), it follows that the proportion of 
customers entering the system is Pog + Po1. Secondly, since there is one customer 
in the system whenever the state is (0 1) or (1, 0) and two customers in the system 
whenever the state is (1, 1) or (6,1), it follows that L, the average number in the 
system, is given by 


L= Poy + Pio + 2(P11 + Poy) 


To derive the average amount of time that an entering customer spends in the 
system, we use the relationship W = L/Aq. Since a potential customer will enter 
the system when the state is either (0,0) or (0,1), it follows that Ag = A(Poo + 
Po;) and hence 


Por + Pin + 2(P11 + Po) 
(Poo + Poi) 


W= 


8.3.5 A Queueing System with Bulk Service 


In this model, we consider a single-server exponential queueing system in which 
the server is able to serve two customers at the same time. Whenever the server 
completes a service, she then serves the next two customers at the same time. 
However, if there is only one customer in line, then she serves that customer by 
herself. We shall assume that her service time is exponential at rate 4 whether 
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she is serving one or two customers. As usual, we suppose that customers arrive 
at an exponential rate A. One example of such a system might be an elevator or 
a cable car that can take at most two passengers at any time. 

It would seem that the state of the system would have to tell us not only how 
many customers there are in the system, but also whether one or two are presently 
being served. However, it turns out that we can more easily solve the problem 
not by concentrating on the number of customers in the system, but rather on the 
number in queue. So let us define the state as the number of customers waiting 
in queue, with two states when there is no one in queue. That is, let us have as a 
state space 0’,0,1,2,..., with the interpretation 


State Interpretation 
0’ No one in service 
0 Server busy; no one waiting 
n,n > 0 n customers waiting 


The transition diagram is shown in Figure 8.2 and the balance equations are 


State Rate at which the process leaves = rate at which it enters 


0! APy = Po 
0 (A + w)Po =APy + wPy + wP2 
nn2zil (A+ p)Pn = APn-1 + bPnt2 


Now the set of equations 

A+ b)Py =APy-1 + UPn+2; i ee (8.13) 
has a solution of the form 

Py, =a"Po 
To see this, substitute the preceding in Equation (8.13) to obtain 

(A + w)a"Po = Aa"! Po + pat? Po 


or 


(A+ wa =2+4+ pod 


Figure 8.2 A transition diagram. 
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Solving this for w yields the following three roots: 


—-1-VY1+4)/u a fl A ee 
: 7 2 


2 


a=1, a= 


As the first two are clearly not possible, it follows that 


V14+4A/u-1 


2 
Hence, 
P, = a"Po, 
bw 
Py = —P 
0 % 0 


where the bottom equation follows from the first balance equation. (We can 
ignore the second balance equation as one of these equations is always redun- 
dant.) To obtain Po, we use 


oo 
Po + Py + >) Pn =1 


n=. 
or 
ji fone 
rotate yoot |= 
n=1 
or 
1 
nfdes]- 
l-a id 
or 
2 AS) 
A+ wd = a) 
and, thus 
n _ 
ue a”"r(1 — @) ps6 
A+ ud —a@) 
1— 
pyc (8.14) 


~ 27+ ud —a@) 
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where 


V14+4A/u-1 


2 


Note that for the preceding to be valid we need a < 1, or equivalently A/u < 2, 
which is intuitive since the maximum service rate is 2, which must be larger 
than the arrival rate 4 to avoid overloading the system. 

All the relevant quantities of interest now can be determined. For instance, to 
determine the proportion of customers that are served alone, we first note that 
the rate at which customers are served alone is APyy + P, since when the system 
is empty a customer will be served alone upon the next arrival and when there is 
one customer in queue he will be served alone upon a departure. As the rate at 
which customers are served is A, it follows that 


AP oy P 
proportion of customers that are served alone = a 
iu 
= Py — <p 
o + el 
Also, 
lee) 
Loe y ne; 
n=1 
A — = 
= aes d na” from Equation (8.14) 
ha oo a 
= by algebraic identit na” = ——_ 
d-ak+nd—-a) 9” "* aaa: 
and 
Lo 
Wo=—, 
or" 
W=Wo+t+-, 
L=1.W 
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8.4.1 Open Systems 


Consider a two-server system in which customers arrive at a Poisson rate A at 
server 1. After being served by server 1 they then join the queue in front of server 2. 
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Server Server 
| >>| 
1 2 fs 
leaves 
system 


Figure 8.3 A tandem queue. 


We suppose there is infinite waiting space at both servers. Each server serves one 
customer at a time with server i taking an exponential time with rate ju; for 
a service, i = 1,2. Such a system is called a tandem or sequential system (see 
Figure 8.3). 

To analyze this system we need to keep track of the number of customers at 
server 1 and the number at server 2. So let us define the state by the pair (”, )— 
meaning that there are m customers at server 1 and m at server 2. The balance 
equations are 


State Rate that the process leaves = rate that it enters 
0,0 APo,0 = H2Po1 
n,0;n > 0 (A+ Hi)Pno = HP + APn-1,0 
0,m;m > 0 (A + 12)Pom = H2Pom+1 + M1 Piyn-1 
n,m;nm > 0 (A+ M1 + H2)Prm = H2Paymsi + M1 Pntiym-1 
ee (8.15) 


Rather than directly attempting to solve these (along with the equation 
Ye nmPnym = 1) we shall guess at a solution and then verify that it indeed satisfies 
the preceding. We first note that the situation at server 1 is just as inan M/M/1 
model. Similarly, as it was shown in Section 6.6 that the departure process of 
an M/M/1 queue is a Poisson process with rate 4, it follows that what server 2 
faces is also an M/M/1 queue. Hence, the probability that there are 7 customers 
at server 1 is 


ra \” Xr 
P{n at server 1} = (=) (1 _ ~) 
M1 M1 


and, similarly, 


~A\™ ny 
P{m at server 2} = (=) (1 - ~) 
2 75) 


Now, if the numbers of customers at servers 1 and 2 were independent random 
variables, then it would follow that 


Pom = (7) (FG) Goa) #6) 
M1 Mi) \ pao [2 
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To verify that P,,, is indeed equal to the preceding (and thus that the number of 
customers at server 1 is independent of the number at server 2), all we need do is 
verify that the preceding satisfies Equations (8.15)—this suffices since we know 
that the Py, are the unique solution of Equations (8.15). Now, for instance, if 
we consider the first equation of (8.15), we need to show that 


Xr Xr Xr Xr Xr 
Co at Ga cs ea 
iM 2 M1) \ m2 7) 
which is easily verified. We leave it as an exercise to show that the Py, as given 
by Equation (8.16), satisfy all of the equations of (8.15), and are thus the limiting 
probabilities. 


From the preceding we see that L, the average number of customers in the 
system, is given by 


nym 


YO Xr VE x 
Poa een m( =) (1-+) 
oi ge ea es ae 
ny i ny 
HMi-A pa—A 


and from this we see that the average time a customer spends in the system is 


Remarks 


(i) The result (Equations (8.15)) could have been obtained as a direct consequence of the 
time reversibility of an M/M/1 (see Section 6.6). For not only does time reversibil- 
ity imply that the output from server 1 is a Poisson process, but it also implies 
(Exercise 26 of Chapter 6) that the number of customers at server 1 is independent 
of the past departure times from server 1. As these past departure times constitute 
the arrival process to server 2, the independence of the numbers of customers in the 
two systems follows. 

(ii) Since a Poisson arrival sees time averages, it follows that in a tandem queue the 
numbers of customers an arrival (to server 1) sees at the two servers are independent 
random variables. However, it should be noted that this does not imply that the 
waiting times of a given customer at the two servers are independent. For a coun- 
terexample suppose that A is very small with respect to ~4 = j2, and thus almost 
all customers have zero wait in queue at both servers. However, given that the wait 
in queue of a customer at server 1 is positive, his wait in queue at server 2 also will 
be positive with probability at least as large as } (why?). Hence, the waiting times 
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in queue are not independent. Remarkably enough, however, it turns out that the 
total times (that is, service time plus wait in queue) that an arrival spends at the two 
servers are indeed independent random variables. 


The preceding result can be substantially generalized. To do so, consider a 
system of k servers. Customers arrive from outside the system to server i, i = 
1,...,, in accordance with independent Poisson processes at rate r;; they then 
join the queue at i until their turn at service comes. Once a customer is served by 
server i, he then joins the queue in front of server j,j = 1,...,&, with probability 
P;;. Hence, ey Py-< A, and 1 = ea P;; represents the probability that a 
customer departs the system after being served by server i. 

If we let 4; denote the total arrival rate of customers to server j, then the A; can 


be obtained as the solution of 
k 
A= TAD Py, tA yes gk (8.17) 
i=1 


Equation (8.17) follows since 7; is the arrival rate of customers to j coming from 
outside the system and, as A; is the rate at which customers depart server i (rate 
in must equal rate out), A;P; is the arrival rate to j of those coming from server i. 

It turns out that the number of customers at each of the servers is independent 
and of the form 


eo YY (pce 
P{n customers at server j} = | — 1-—], n>1 
Mj Mj 
where ju; is the exponential service rate at server j and the A; are the solution 
to Equation (8.17). Of course, it is necessary that ;/; < 1 for all j. To prove 


this, we first note that it is equivalent to asserting that the limiting probabilities 
P(m1,12,...,%%) = P{n; at server j,j = 1,...,k} are given by 
k 
Ay \ Xj 
Ponsmas...sm) = TT ‘) (1- ‘) (8.18) 


Ler Kj 


which can be verified by showing that it satisfies the balance equations for this 
model. 
The average number of customers in the system is 


k 
L= ye average number at server / 


j=1 
k 
pare 
es 
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The average time a customer spends in the system can be obtained from L = AW 


with A = 4 rj. (Why not A = pa Aj?) This yields 


a Ai/ (uj — Aj) 


W= z 
Dt % 


Remark The result embodied in Equation (8.18) is rather remarkable in that it 
says that the distribution of the number of customers at server i is the same as in 
an M/M/1 system with rates A; and y;. What is remarkable is that in the network 
model the arrival process at node i need not be a Poisson process. For if there 
is a possibility that a customer may visit a server more than once (a situation 
called feedback), then the arrival process will not be Poisson. An easy example 
illustrating this is to suppose that there is a single server whose service rate is 
very large with respect to the arrival rate from outside. Suppose also that with 
probability p = 0.9 a customer upon completion of service is fed back into the 
system. Hence, at an arrival time epoch there is a large probability of another 
arrival in a short time (namely, the feedback arrival); whereas at an arbitrary 
time point there will be only a very slight chance of an arrival occurring shortly 
(since A is so very small). Hence, the arrival process does not possess independent 
increments and so cannot be Poisson. 

Thus, we see that when feedback is allowed the steady-state probabilities of 
the number of customers at any given station have the same distribution as in an 
M/M/1 model even though the model is not M/M/1. (Presumably such quantities 
as the joint distribution of the number at the station at two different time points 
will not be the same as for an M/M/1.) 


Example 8.6 Consider a system of two servers where customers from outside 
the system arrive at server 1 at a Poisson rate 4 and at server 2 at a Poisson 
rate 5. The service rates of 1 and 2 are respectively 8 and 10. A customer upon 
completion of service at server 1 is equally likely to go to server 2 or to leave the 
system (i.e., Py; = 0, Py2 = 3)3 whereas a departure from server 2 will go 25 
percent of the time to server 1 and will depart the system otherwise (i.e., P21 = i, 
P22 = 0). Determine the limiting probabilities, L, and W. 


Solution: The total arrival rates to servers 1 and 2—call them A, and Az—can 
be obtained from Equation (8.17). That is, we have 


Ay=4+ Tho, 
A2=S+ 54 
implying that 
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Hence, 


P{n at server 1, m at server 2} = G 


and 


6 8 
LA Gog Tose 


8 
{9 

VSS a 
9° 9 


8.4.2 Closed Systems 


The queueing systems described in Section 8.4.1 are called open systems since cus- 
tomers are able to enter and depart the system. A system in which new customers 
never enter and existing ones never depart is called a closed system. 

Let us suppose that we have m customers moving among a system of k servers, 
where the service times at server i are exponential with rate j,i = 1,...,k. When 
a customer completes service at server i, she then joins the queue in front of server 
jj =1,...,k, with probability Pj, where we now suppose that peas = 1 for 
alli = 1,...,k. That is, P = [Pj] is a Markov transition probability matrix, 
which we shall assume is irreducible. Let 7 = (11,...,,) denote the stationary 
probabilities for this Markov chain; that is, 7 is the unique positive solution of 


yon =1 (8.19) 


If we denote the average arrival rate (or equivalently the average service com- 
pletion rate) at server j by Aw(/), 7 = 1,...,& then, analogous to Equation (8.17), 
the An(/) satisfy 


k 
Am(@) = Do km@)Pij 
i=1 
Hence, from (8.19) we can conclude that 


An(i) =Amm™y fF =1,2,.0.5k (8.20) 
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where 
k 
Li held) (8.21) 


From Equation (8.21), we see that A,, is the average service completion rate of 
the entire system, that is, it is the system throughput rate.* 
If we let P,.(71,72,...,7%) denote the limiting probabilities 


Pin(n1,12,...,%g) = P{n; customers at server j,j = 1,...,k} 


then, by verifying that they satisfy the balance equation, it can be shown that 


Km TT Qm@/my, if DE nj =m 


Pin(11, 112, -.+5Mg) = : 
0, otherwise 


But from Equation (8.20) we thus obtain 


k ; . k 
im Vie (i /M”, £Y* n= 
Prlttisttr,...gmy) = {Omri Diem gay 
0, otherwise 
where 
-1 
k 
Cu =] Do [] ila” (8.23) 
sue UR? = 1 


Equation (8.22) is not as useful as we might suppose, for in order to utilize 
it we must determine the normalizing Constant Cy given by Equation (8.23), 
which requires summing the products nf _, (aj /p;)"' over all the feasible vectors 
(11,...,Mp): Dij—1 nj = m. Hence, since there are (ne 4) vectors this is only 
compuenionslly feasible for relatively small values of m and k. 

We will now present an approach that will enable us to determine recursively 
many of the quantities of interest in this model without first computing the nor- 
malizing constants. To begin, consider a customer who has just left server i and 
is headed to server j, and let us determine the probability of the system as seen by 
this customer. In particular, let us determine the probability that this customer 


* We are just using the notation 4,,(/) and A,, to indicate the dependence on the number of customers 
in the closed system. This will be used in recursive relations we will develop. 
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observes, at that moment, 7; customers at server /,/ = 1,...,k, 5 nj =m—1. 
This is done as follows: 
P{customer observes 7; at server /,/] = 1,...,&| customer goes from i to /} 


__ P{state is (1,..., mj + 1,...,j,..., Mp), customer goes from i to /} 


P{customer goes from i to /} 
= Pr, ...5Mi + 1,..., Mj... +5 Mg) MiP 
ded njam—1 Pm (M1, ++. + 1, -. Me) MiP ii 
_ (i/Hi) VWs rj /mj)” 
i K 


from (8.22) 


k 
=C | [@i/en” 


j=1 


where C does not depend on 71,..., 2%. But because the preceding is a probability 
density on the set of vectors (71,..., 7p), wan nj = m—1, it follows from (8.22) 
that it must equal P,,_1(11,...,”,). Hence, 


P{customer observes 7; at server /,/ = 1,..., | customer goes from i to /} 
k 

= Pyy_1(11,..-5M)s Yinjg=m— 1 (8.24) 
i=1 


As (8.24) is true for all i, we thus have proven the following proposition, known 
as the arrival theorem. 


Proposition 8.3 (The Arrival Theorem) In the closed network system with m cus- 
tomers, the system as seen by arrivals to server j is distributed as the stationary 
distribution in the same network system when there are only m — 1 customers. 


Denote by L,,(j) and W,,(j) the average number of customers and the average 
time a customer spends at server j when there are m customers in the network. 
Upon conditioning on the number of customers found at server j by an arrival to 
that server, it follows that 


1 + E,,[number at server j as seen by an arrival] 
Mj 


Wn) 


_ it ln (8.25) 


Bj 


where the last equality follows from the arrival theorem. Now when there are 
m — 1 customers in the system, then, from Equation (8.20), Aj_1(/), the average 
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arrival rate to server /, satisfies 
Am—1 9) = Am—17; 


Now, applying the basic cost identity Equation (8.1) with the cost rule being that 
each customer in the network system of m — 1 customers pays one per unit time 
while at server j, we obtain 


Lin-1Q) = Am—17j Wn-1) (8.26) 


Using Equation (8.25), this yields 


. 1+ Am—11j Wm-1Q) 
WG) = : (8.27) 
Mj 
Also using the fact that ey Lm-19) = m—1 (why?) we obtain, from 
Equation (8.26), the following: 
k 
m—1V= m1 Yj Wn 
j=l 
or 
-—1 
hint = —— (8.28) 
v1 ji Win—1 (2) 
Hence, from Equation (8.27), we obtain the recursion 
; 1 m— 1)1;Wm_1G 
Wee ae (8.29) 


k 
Bi py Dein TWin-1 0) 


Starting with the stationary probabilities 7j,j = 1,...,k, and Wi(/) = 1/p; 
we can now use Equation (8.29) to determine recursively W2(/), W3(/),.--; 
Wn(/). We can then determine the throughput rate 4,,, by using Equation (8.28), 
and this will determine L,,(j) by Equation (8.26). This recursive approach is 
called mean value analysis. 


Example 8.7 Consider a k-server network in which the customers move in a 
cyclic permutation. That is, 


Pywt=1, i=1,2...,k-1, Pey=l 


Let us determine the average number of customers at server 7 when there are two 
customers in the system. Now, for this network, 


m=1/k, i=1,...,k 
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and as 
; 1 
Wy =— 
Bj 


we obtain from Equation (8.29) that 


1 1/k)(A/p; 
AOE as )A/H,j) 

Mj aj Doj-4 A/R) (A /t) 

1 1 


+ 
Hj pe Dh Vani 
Hence, from Equation (8.28), 


_ 2 7 2k 

ok ok 
pw »(- + : ) 
l=1 


2~k 
l=1 ad) My ea 1/pi 


A2 


and finally, using Equation (8.26), 


1 En 1 ) 
Mj Ts par V/pj 


a? ae) 


k 
Mp ea 1/ mi 


1 
Inq) = 427 Wa) = 


Another approach to learning about the stationary probabilities specified by 
Equation (8.22), which finesses the computational difficulties of computing the 
constant C,,, is to use the Gibbs sampler of Section 4.9 to generate a Markov 
chain having these stationary probabilities. To begin, note that since there are 
always a total of m customers in the system, Equation (8.22) may equivalently 
be written as a joint mass function of the numbers of customers at each of the 
servers 1,...,k — 1, as follows: 


k-1 


Pin (1115 --+5 M1) = Cm (ote /mp)y” =" | [ej /m” 
j=l 


k-1 k-1 
=K||[@)", Sonj<m 
j=l j=1 
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where aj = (mjUg)/(aRuj)5f = 1,...,k — 1. Now, if N = (Nq,..., Nz_1) has the 
preceding joint mass function then 


P{N; = n|Nq = 14,..., Ni-1 = ni-1, Ni41 = Mi41,---, Ng-1 = Me-1} 


Pm (1, -- +, Mi-15M, Nits +++) MR-1) 
ye Pm, sey Mj—151,Nj415--+5 Nk-1) 


It follows from the preceding that we may use the Gibbs sampler to gener- 
ate the values of a Markov chain whose limiting probability mass function is 
Pin (114, -.-5%p—1) as follows: 


1. Let (m,...,mp_1) be arbitrary nonnegative integers satisfying Sy ny <Qm. 


Generate a random variable I that is equally likely to be any of 1,...,k — 1. 
3. If] =i, set s =m — ¥7.,,n;, and generate the value of a random variable X having 
probability mass function 


P{X =n} = Ca’, n=0,...,S 


4. Let ny = X and go to step 2. 


The successive values of the state vector (11,...,p_1,™ — paar nj) constitute 


the sequence of states of a Markov chain with the limiting distribution P,,,. All 
quantities of interest can be estimated from this sequence. For instance, the aver- 
age of the values of the jth coordinate of these vectors will converge to the mean 
number of individuals at station j, the proportion of vectors whose jth coordi- 
nate is less than r will converge to the limiting probability that the number of 
individuals at station / is less than r, and so on. 

Other quantities of interest can also be obtained from the simulation. For 
instance, suppose we want to estimate Wj, the average amount of time a customer 
spends at server j on each visit. Then, as noted in the preceding, L;, the average 
number of customers at server j, can be estimated. To estimate Wj, we use the 
identity 


Lj = iW; 


where A; is the rate at which customers arrive at server j. Setting A; equal to the 
service completion rate at server j shows that 


Aj = Pi{j is busy}; 


Using the Gibbs sampler simulation to estimate P{j is busy} then leads to an 
estimator of Wj. 
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8.5 The System M/G/1 


8.5.1 Preliminaries: Work and Another Cost Identity 


For an arbitrary queueing system, let us define the work in the system at any time 
t to be the sum of the remaining service times of all customers in the system at 
time t. For instance, suppose there are three customers in the system—the one in 
service having been there for three of his required five units of service time, and 
both people in queue having service times of six units. Then the work at that time 
is2+6+6=14. Let V denote the (time) average work in the system. 

Now recall the fundamental cost equation (8.1), which states that the 


average rate at which the system earns 


= qx average amount a customer pays 


and consider the following cost rule: Each customer pays at a rate of y/unit time 
when his remaining service time is y, whether he is in queue or in service. Thus, 
the rate at which the system earns is just the work in the system; so the basic 
identity yields 


V = ),E[amount paid by a customer] 


Now, let S and WG denote respectively the service time and the time a given 


customer spends waiting in queue. Then, since the customer pays at a constant 
rate of S per unit time while he waits in queue and at a rate of S— x after spending 
an amount of time x in service, we have 


AY 
E[amount paid by a customer] = E sw + / (S — x) tx 
0 


and thus 


AgE[S?] 


V= AaE[SWO] + 5) 


(8.30) 
It should be noted that the preceding is a basic queueing identity (like Equa- 
tions (8.2)-(8.4)) and as such is valid in almost all models. In addition, if a 


customer’s service time is independent of his wait in queue (as is usually, but not 
always the case),* then we have from Equation (8.30) that 


haE[S*] 


V = AsEISIWo + 5 


(8.31) 


* For an example where it is not true, see Section 8.6.2. 
Pp > 
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8.5.2 Application of Work to M/G/1 


The M/G/1 model assumes (i) Poisson arrivals at rate A; (ii) a general service 
distribution; and (iii) a single server. In addition, we will suppose that customers 
are served in the order of their arrival. 

Now, for an arbitrary customer in an M/G/1 system, 


customer’s wait in queue = work in the system when he arrives (8.32) 


This follows since there is only a single server (think about it!). Taking expecta- 
tions of both sides of Equation (8.32) yields 


Wo = average work as seen by an arrival 


But, due to Poisson arrivals, the average work as seen by an arrival will equal V, 
the time average work in the system. Hence, for the model M/G/1, 


Wo=V 
The preceding in conjunction with the identity 


AE[S?] 
} 


yields the so-called Pollaczek—Khintchine formula, 


_— AE[S?] 


where E[S] and E[S*] are the first two moments of the service distribution. 
The quantities L, Lo, and W can be obtained from Equation (8.33) as 


Lo =2Wo = 57 
W=Wo+ E[S] = wt + E[S], (8.34) 
L=\,W= Is 
Remarks 


(i) For the preceding quantities to be finite, we need AE[S] < 1. This condition is 
intuitive since we know from renewal theory that if the server was always busy, then 
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the departure rate would be 1/E[S] (see Section 7.3), which must be larger than the 
arrival rate A to keep things finite. 

(ii) Since E[S*] = Var(S) + (E[S])*, we see from Equations (8.33) and (8.34) that, for 
fixed mean service time, L, Lo, W, and Wg all increase as the variance of the service 
distribution increases. 

(iii) Another approach to obtain Wo is presented in Exercise 38. 


8.5.3 Busy Periods 


The system alternates between idle periods (when there are no customers in the 
system, and so the server is idle) and busy periods (when there is at least one 
customer in the system, and so the server is busy). 

Let I and B represent, respectively, the length of an idle and of a busy period. 
Because I represents the time from when a customer departs and leaves the system 
empty until the next arrival, it follows, since arrivals are according to a Poisson 
process with rate A, that I is exponential with rate 1 and thus 


(8.35) 


To determine E[B] we argue, as in Section 8.3.3, that the long-run proportion 
of time the system is empty is equal to the ratio of E[I] to E[I] + E[B]. That is, 


EW 


To compute Po, we note from Equation (8.4) (obtained from the fundamental 
cost equation by supposing that a customer pays at a rate of one per unit time 
while in service) that 

average number of busy servers = AE[S] 
However, as the left-hand side of the preceding equals 1 — Pp (why?), we have 
Pos t= 2S (8.37) 
and, from Equations (8.35)-(8.37), 


1/d 


or 
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Another quantity of interest is C, the number of customers served in a busy 
period. The mean of C can be computed by noting that, on the average, for every 
E[C] arrivals exactly one will find the system empty (namely, the first customer 
in the busy period). Hence, 


and, as a9 = Po = 1 — AE[S] because of Poisson arrivals, we see that 


1 


oT TBD 


8.6 Variations on the M/G/1 


8.6.1 The M/G/1 with Random-Sized Batch Arrivals 


Suppose that, as in the M/G/1, arrivals occur in accordance with a Poisson 
process having rate 4. But now suppose that each arrival consists not of a single 
customer but of a random number of customers. As before there is a single server 
whose service times have distribution G. 

Let us denote by aj,j > 1, the probability that an arbitrary batch consists 
of j customers; and let N denote a random variable representing the size of a 
batch and so P{N = j} = aj. Since 4g = AE(N), the basic formula for work 
(Equation (8.31)) becomes 


(8.38) 


2 
V = AEN] Eo age | 


2 


To obtain a second equation relating V to Wo, consider an average customer. 
We have that 


his wait in queue = work in system when he arrives 


+ his waiting time due to those in his batch 


Taking expectations and using the fact that Poisson arrivals see time averages 
yields 


Wo = V + E[waiting time due to those in his batch] 
=V+ E[ Ws] (8.39) 


Now, E(W3) can be computed by conditioning on the number in the batch, but 
we must be careful because the probability that our average customer comes from 
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a batch of size j is not a. For a; is the proportion of batches that are of size j, and 
if we pick a customer at random, it is more likely that he comes from a larger 
rather than a smaller batch. (For instance, suppose a1 = a199 = $s then half the 
batches are of size 1 but 100/101 of the customers will come from a batch of 
size 100!) 

To determine the probability that our average customer came from a batch 
of size 7 we reason as follows: Let M be a large number. Then of the first M 
batches approximately Ma; will be of size j, 7 > 1, and thus there would have 
been approximately jMa; customers that arrived in a batch of size 7. Hence, the 
proportion of arrivals in the first M batches that were from batches of size j is 
approximately j;Ma;/ > 7, /Maj. This proportion becomes exact as M — oo, and 
so we see that 


proportion of customers from batches of size j = 


We are now ready to compute E(Wa), the expected wait in queue due to others 
in the batch: 


E[Wez] = s E[ Ws | batch of size /] a 


J 


(8.40) 


Now if there are j customers in his batch, then our customer would have to wait 
for i— 1 of them to be served if he was ith in line among his batch members. As 
he is equally likely to be either 1st, 2nd, ..., or jth in line we see that 


j 
E[ Wz | batch is of size j] = SG _ HES) 
i=1 


Substituting this in Equation (8.40) yields 


E[S] a Ae 
ELWe] = sexy du — 1)ja; 


E[S](E[N7] — E[N}) 
a 2E[N] 
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and from Equations (8.38) and (8.39) we obtain 


E[S](E[N?] — E[N])/2E[N] + AE[N]E[S?]/2 


vos 1 — AE[NIE[S] 


Remarks 


(i) Note that the condition for Wo to be finite is that 


1 
AE(N) < ELS] 


which again says that the arrival rate must be less than the service rate (when the 
server is busy). 

(ii) For fixed value of E[N], Wo is increasing in Var(N), again indicating that “single- 
server queues do not like variation.” 

(iii) The other quantities L, Lo, and W can be obtained by using 


W = Wo + EISI, 
L=),W =AE[NIW, 
Lo =AE[N]Wo 


8.6.2 Priority Queues 


Priority queueing systems are ones in which customers are classified into types 
and then given service priority according to their type. Consider the situation 
where there are two types of customers, which arrive according to independent 
Poisson processes with respective rates 41 and Az, and have service distributions 
G, and G2. We suppose that type 1 customers are given service priority, in that 
service will never begin on a type 2 customer if a type 1 is waiting. However, if 
a type 2 is being served and a type 1 arrives, we assume that the service of the 
type 2 is continued until completion. That is, there is no preemption once service 
has begun. 

Let Wo denote the average wait in queue of a type i customer, i = 1,2. Our 
objective is to compute the Wo. 

First, note that the total work in the system at any time would be exactly the 
same no matter what priority rule was employed (as long as the server is always 
busy whenever there are customers in the system). This is so since the work will 
always decrease at a rate of one per unit time when the server is busy (no matter 
who is in service) and will always jump by the service time of an arrival. Hence, 
the work in the system is exactly as it would be if there was no priority rule but 
rather a first-come, first-served (called FIFO) ordering. However, under FIFO the 


544 Queueing Theory 


preceding model is just M/G/1 with 
A=A1 +A2, 


G(x) = G1) + G(x) (8.41) 


which follows since the combination of two independent Poisson processes is 
itself a Poisson process whose rate is the sum of the rates of the component 
processes. The service distribution G can be obtained by conditioning on which 
priority class the arrival is from—as is done in Equation (8.41). 

Hence, from the results of Section 8.5, it follows that V, the average work in 
the priority queueing system, is given by 


AE[S?] 
~ 20 — AEs) 
(1 /AELST] + (2/2) ELS31) 
~ Q[L = AC(AL/AVE[S1] + (A2/A)E[S2))] 
_ __ ALE[ST] + A2ELS5] 
~ 2(1 — Aq E[S1] — A2E[S2)) 


(8.42) 


where S; has distribution G;, i = 1, 2. 

Continuing in our quest for Wo let us note that Sand W4, the service and wait 
in queue of an arbitrary customer, are not independent in the priority model since 
knowledge about S gives us information as to the type of customer, which in turn 
gives us information about W%. To get around this we will compute separately 
the average amount of type 1 and type 2 work in the system. Denoting V’ as the 
average amount of type i work we have, exactly as in Section 8.5.1, 


AiE[S?] 


Vi = 4EISIWo + ane i= 1,2 (8.43) 


If we define 
Vo = MEISIWo, 


AGELS?] 


Vo= 5 


then we may interpret Yo as the average amount of type i work in queue, and 
V6 as the average amount of type i work in service (why?). 
Now we are ready to compute Wo. To do so, consider an arbitrary type 1 


arrival. Then 
his delay = amount of type 1 work in the system when he arrives 


+ amounts of type 2 work in service when he arrives 
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Taking expectations and using the fact that Poisson arrivals see time average 
yields 


Wo =V'+V5 


ME[St] — A2E[S5] 
ams 


= ME[S1] WG + (8.44) 


or 


wis A, E[ST] + A2E[S3] 
O20 = ELS) 


(8.45) 


To obtain W2, we first note that since V=V!+V2, we have from Equa- 
tions (8.42) and (8.43) that 


4 E[S7] + A2ELS5] 
2(1 — A, E[S1] — A2E[S2]) 


= ME[S1]Wg + A2E[S21WG 


A E[S?] ” A2ETS5] 
2, 2 
= Wé ae A2E[S21WG from Equation (8.44) 


Now, using Equation (8.45), we obtain 


2 _ AE[St] + A2E[S3] 1 1 
ELS2]Wo = z 1 MES = OESal 1M ES 
or 
gees A ELST] + A2ELSS] (8.46) 
QO 20 — AV E[S1] — A2E[S2]) — A, E[Si)) 
Remarks 


(i) Note that from Equation (8.45), the condition for W6 to be finite is that A, E[S,] < 1, 
which is independent of the type 2 parameters. (Is this intuitive?) For Wo to be finite, 
we need, from Equation (8.46), that 


A E[S1] + A2E[S2] < 1 
Since the arrival rate of all customers is A = 44 + Az, and the average service time 


of a customer is (Ay /A)E[S1] + (A2/A)E[S2], the preceding condition is just that the 
average arrival rate be less than the average service rate. 
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(ii) 


If there are 7 types of customers, we can solve for V/,j = 1,...,7, ina similar fashion. 
First, note that the total amount of work in the system of customers of types 1,...,/ 
is independent of the internal priority rule concerning types 1,...,j and only depends 
on the fact that each of them is given priority over any customers of typesj + 1,...,7. 
(Why is this? Reason it out!) Hence, V! + --- + V/ is the same as it would be if 
types 1,...,7 were considered as a single type I priority class and types j + 1,...,” 
as a single type II priority class. Now, from Equations (8.43) and (8.45), 


yl _ MEISE] + AnELSHIELSi| 
7 2(1 — ATE[S)) 


where 


Ap=Ar te + Aj, 
An = Ajta t+ + Ans 
ae 
EIS] =) > ElSil, 
i=i 
ae 
EIS] = 7 5 EIS;1, 
at. 
F[sil= > ELS; 
mj “Hl 


Hence, as V! = V! + .-- + V/, we have an expression for V! + --- + V/, for each 
j = 1,...,”, which then can be solved for the individual v!,Vv2,...,V”. We now 
can obtain Wo from Equation (8.43). The result of all this (which we leave for an 


exercise) is that 
AL E[S{] + +++ + AnE[S2] 


wi, = — Cetin (8.47) 
2] Gjepi — A. ES1) — + — 27S) 


8.6.3 An M/G/1 Optimization Example 


Consider a single-server system where customers arrive according to a Poisson 
process with rate 4, and where the service times are independent and have dis- 
tribution function G. Let = AE[S], where S represents a service time random 
variable, and suppose that p < 1. Suppose that the server departs whenever a 
busy period ends and does not return until there are 7 customers waiting. At that 
time the server returns and continues serving until the system is once again empty. 
If the system facility incurs costs at a rate of c per unit time per customer in the 
system, as well as a cost K each time the server returns, what value of n,n > 1, 
minimizes the long-run average cost per unit time incurred by the facility, and 
what is this minimal cost? 
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To answer the preceding, let us first determine A(7), the average cost per unit 
time for the policy that returns the server whenever there are n customers waiting. 
To do so, say that a new cycle begins each time the server returns. As it is easy 
to see that everything probabilistically starts over when a cycle begins, it follows 
from the theory of renewal reward processes that if C(7) is the cost incurred in a 
cycle and T(n) is the time of a cycle, then 


_ ELC”) 
AG) = Fra] 


To determine E[C(n)] and E[T(x)], consider the time interval of length, say, T; 
starting from the first time during a cycle that there are a total of i customers 
in the system until the first time afterward that there are only i — 1. Therefore, 
>, Tj is the amount of time that the server is busy during a cycle. Adding the 
additional mean idle time until 7 customers are in the system gives 


E[T(”)] = SET] +n/r 
i=1 


Now, consider the system at the moment when a service is about to begin and 
there are i — 1 customers waiting in queue. Since service times do not depend on 
the order in which customers are served, suppose that the order of service is last 
come first served, implying that service does not begin on the i — 1 presently in 
queue until these i— 1 are the only ones in the system. Thus, we see that the time 
that it takes to go from i customers in the system to 7 — 1 has the same distri- 
bution as the time it takes the M/G/1 system to go from a single customer (just 
beginning service) to empty; that is, its distribution is that of B, the length of an 
M/G/1 busy period. (Essentially the same argument was made in Example 5.25.) 
Hence, 


E(T) = E(B) = 2S) 
Lp 
implying that 
nE[S] n n 


1-aE[S] 2 3—D meee) 


To determine E[C(7)], let C; denote the cost incurred during the interval of 
length T; that starts with i — 1 in queue and a service just beginning and ends 
when the i — 1 in queue are the only customers in the system. Thus, K + }7/, C; 
represents the total cost incurred during the busy part of the cycle. In addition, 
during the idle part of the cycle there will be i customers in the system for an 
exponential time with rate A,i = 1,...,2 — 1, resulting in an expected cost of 
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c(1 +---+n-—1)/A. Consequently, 


E[C(n)] = K + Dac 4+ MAT De (8.49) 


To find E[C;], consider the moment when the interval of length T; begins, and 
let W; be the sum of the initial service time plus the sum of the times spent in the 
system by all the customers that arrive (and are served) until the moment when 
the interval ends and there are only i — 1 customers in the system. Then, 


C; = @ — 1)cT; + cW; 


where the first term refers to the cost incurred due to the i — 1 customers in 
queue during the interval of length T;. As it is easy to see that W; has the same 
distribution as W,, the sum of the times spent in the system by all arrivals in an 
M/G/1 busy period, we obtain 


E[Cj] = G@- be + cE[W,] (8.50) 


Using Equation (8.49), this yields 


_ n(n — 1)cE[S] n(n — 1)c 


= K + ncE[W,] + —— ( L +1) 
p 
= K+ ncE[W,] + 
Utilizing the preceding in conjunction with Equation (8.48) shows that 


A(n) = “c= + Ac(1 — p)E[Wp] + oe) 


(8.51) 


To determine E[ Wp], we use the result that the average amount of time spent 
in the system by a customer in the M/G/1 system is 


,E[S2 
W = Wo + E[S] = ae + E[S] 


However, if we imagine that on day j,j > 1, we earn an amount equal to the total 
time spent in the system by the jth arival at the M/G/1 system, then it follows 
from renewal reward processes (since everything probabilistically restarts at the 
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end of a busy period) that 


— ELW] 
~ EN] 


where N is the number of customers served in an M/G/1 busy period. Since 
E[N] = 1/(1 — p) we see that 


AE[S2] 


1 — p)E[W,] = W = ———— 
(1 — p)ELWp] da 


+ E[S] 
Therefore, using Equation (8.51), we obtain 


Kid —p)  ca*E[S?] c(n — 1) 


ema ian a 


To determine the optimal value of n, treat 7 as a continuous variable and differ- 
entiate the preceding to obtain 


—Kix - 
( eee 


A’(n) = 2 5 


Setting this equal to 0 and solving yields that the optimal value of 7 is 


o-, FIRES) 
n* = ,{| —————_ 
Cc 


and the minimal average cost per unit time is 


cA* E[S?] c 
A(n*) = J/2AK(1 — p)e + 2d — p) + co ) 


It is interesting to see how close we can come to the minimal average cost 
when we use a simpler policy of the following form: Whenever the server finds 
the system empty of customers she departs and then returns after a fixed time 
t has elapsed. Let us say that a new cycle begins each time the server departs. 
Both the expected costs incurred during the idle and the busy parts of a cycle are 
obtained by conditioning on N(t), the number of arrivals in the time ¢ that the 
server is gone. With C(t) being the cost incurred during a cycle, we obtain 


N() 
E[C@) |N@] =K+ D E[Ci] + NU); 


N(@)(N(t) — 1)cE[S] 


=K 
= Da =) 


+ N(t)cE[Wy] + Nit) 
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The final term of the first equality is the conditional expected cost during the idle 
time in the cycle and is obtained by using that, given the number of arrivals in 
the time t, the arrival times are independent and uniformly distributed on (0, 2); 
the second equality used Equation (8.50). Since N(t) is Poisson with mean Af, it 
follows that E[N(t)(N(t) — 1)] = E[N?(4)] — E[N(@)] = 422*. Thus, taking the 
expected value of the preceding gives 


= 07 t7-cE[S] cAt? 
E = K + —— + AtcE pein 
[C(t)] + 7d —p + AtcE[Wp] + 5) 
cht? 
= K+ ——— + itcE|W, 
(1 — p) [Wo] 


Similarly, if T(z) is the time of a cycle, then 


E[T@] = ElLEIT@IN@II 
= Eft + N@E[B]] 
= pt 
=t+ <= 
- t 

1—p 


Hence, the average cost per unit time, call it A(2), is 


A(t) = ELC] 
E[T()] 
= <c? + a + ca(1 — p)ELWp] 


Thus, from Equation (8.51), we see that 
A(n/a) — A(n) = c/2 


which shows that allowing the return decision to depend on the number presently 
in the system can reduce the average cost only by the amount c/2. a 


8.6.4 The M/G/1 Queue with Server Breakdown 


Consider a single server queue in which customers arrive according to a Poisson 
process with rate 4, and where the amount of service time required by each 
customer has distribution G. Suppose, however, that when working the server 
breaks down at an exponential rate a. That is, the probability a working server 
will be able to work for an additional time t without breaking down is e~™. 
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When the server breaks down, it immediately goes to the repair facility. The 
repair time is a random variable with distribution H. Suppose that the customer 
in service when a breakdown occurs has its service continue, when the sever 
returns, from the point it was at when the breakdown occurred. (Therefore, the 
total amount of time a customer is actually receiving service from a working 
server has distribution G.) 

By letting a customer’s “service time” include the time that the customer is 
waiting for the server to come back from being repaired, the preceding is an 
M/G/1 queue. If we let T denote the amount of time from when a customer first 
enters service until it departs the system, then T is a service time random variable 
of this M/G/1 queue. The average amount of time a customer spends waiting in 
queue before its service first commences is, thus, 


“ 


E[T2] 


Wo = 54 —aET) 


To compute E[T] and E[T7], let S, having distribution G, be the service require- 
ment of the customer; let N denote the number of times that the server breaks 
down while the customer is in service; let Ry, R2,... be the amounts of time the 
server spends in the repair facility on its successive visits. Then, 


N 
T= Se R; +S 
i=1 
Conditioning on S yields 


S=s|+s, 


N 
E[T|S=s]=E| 7 R; 
i=1 


N 
Var(T|S = s) = Var > R;|S=s 


i=1 


Now, a working server always breaks down at an exponential rate a. Therefore, 
given that a customer requires s units of service time, it follows that the number 
of server breakdowns while that customer is being served is a Poisson random 
variable with mean as. Consequently, conditional on S = s, the random variable 
y-N_, Rj is a compound Poisson random variable with Poisson mean as. Using 
the results from Examples 3.10 and 3.17, we thus obtain 


S=s) =asE[R7] 


N 
E ae R; 
i=1 


N 
S=s}|=asE[R], Var a R; 
i=1 
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where R has the repair distribution H. Therefore, 


E[T|S] = oSE[R] + S = S(1 + aE[R]), 
Var(T|S) = aSE[R7] 


Thus, 
E[T] = E[E[T|S]] = E[S]}(1 + wE[R]) 
and, by the conditional variance formula, 


Var(T) = E[Var(T|S)] + Var(E[T|S]) 
= wE[S]E[R7] + (1 + aE[R])?Var(S) 


Therefore, 


E[T?] = Var(T) + (E[T])” 
= wE[SJE[R?] + (1 + aE[R])7E[S7] 


Consequently, assuming that AE[T] = AE[S](1 + wE[R]) < 1, we obtain 


We — 2@EISIELR*] + ACL + @E[R])*ElS*] 
2 2 = AISI + @E[RD) 


From the preceding, we can now obtain 


Lo =4Wo, 
W=Wo+ E[T], 
L=ij~W 


Some other quantities we might be interested in are 


(i) Pw, the proportion of time the server is working; 
(ii) P,, the proportion of time the server is being repaired; 
(iii) P;, the proportion of time the server is idle. 


These quantities can all be obtained by using the queueing cost identity. For 
instance, if we suppose that customers pay 1 per unit time while actually being 
served, then 


average rate at which system earns = P,,, 
average amount a customer pays = E[S] 
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Therefore, the identity yields 


To determine P,, suppose a customer whose service is interrupted pays 1 per unit 
time while the server is being repaired. Then, 


average rate at which system earns = P,, 


N 
average amount a customer pays = E bp a | = aE[S]E[R] 


yielding 
P, = A@E[SJE[R] 
Of course, P; can be obtained from 


P}=1—Py—P, 


Remark The quantities P,, and P, could also have been obtained by first noting 
that 1 — Po = AE[T] is the proportion of time the server is either working or in 
repair. Thus, 


7 E[T] — E[S] _ 
P, = AE[T] aa AE[S]@E[R] = 


8.7 The Model G/M/1 


The model G/M/1 assumes that the times between successive arrivals have an 
arbitrary distribution G. The service times are exponentially distributed with rate 
wand there is a single server. 

The immediate difficulty in analyzing this model stems from the fact that 
the number of customers in the system is not informative enough to serve as 
a state space. For in summarizing what has occurred up to the present we would 
need to know not only the number in the system, but also the amount of time 
that has elapsed since the last arrival (since G is not memoryless). (Why need we 
not be concerned with the amount of time the person being served has already 
spent in service?) To get around this problem we shall only look at the system 
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when a customer arrives; and so let us define X,, > 1, by 
X, = the number in the system as seen by the th arrival 


It is easy to see that the process {X,, > 1} is a Markov chain. To compute 
the transition probabilities P;; for this Markov chain let us first note that, as long 
as there are customers to be served, the number of services in any length of time 
t is a Poisson random variable with mean jt. This is true since the time between 
successive services is exponential and, as we know, this implies that the number 
of services thus constitutes a Poisson process. Hence, 


oo j 
Piitd—j = i. gr GGe, J = 0, Lsgyaee 
0 J: 

which follows since if an arrival finds i in the system, then the next arrival will 
find i + 1 minus the number served, and the probability that j will be served is 
easily seen to equal the right side of the preceding (by conditioning on the time 
between the successive arrivals). 

The formula for Pj is a little different (it is the probability that at least i + 1 
Poisson events occur in a random length of time having distribution G) and can 
be obtained from 


i 
Pe =1- > Pig 
j=0 


The limiting probabilities 7,,k = 0,1,..., can be obtained as the unique solu- 
tion of 


Tp= SY mPa k>0, 
i 
Er =1 
i 


which, in this case, reduce to 


oo oo t i+1—k 
tT, = 2 n: [ ue dG(t), k = 1, 
7 oe (8.52) 
Som: =1 
0 


(We have not included the equation zo = }° 7;Pio since one of the equations is 
always redundant.) 
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To solve the preceding, let us try a solution of the form x, = cB*. Substitution 
into Equation (8.52) leads to 


0° poo i+1-k 
epee se i ewe UT dG(t) 


rae G+1-k)! 
ae ar eT Ht pk—-1 3 Bier ees (8.53) 
do perl ®)! 
However, 
(Buty (Buty 
2 G+1—k)! = 2 j! 
Bt 


and thus Equation (8.53) reduces to 


pk = pk} te e HEB) dG(t) 
0 
or 
p= ie e H#O-B) dG(t) (8.54) 
0 


The constant c can be obtained from }°, 2, = 1, which implies that 


or 


As (xp) is the unique solution to Equation (8.52), and 2, = (1 — B)B* satisfies, it 
follows that 


Tm=(1- ppt, k=0,1,... 


where £ is the solution of Equation (8.54). (It can be shown that if the mean 
of G is greater than the mean service time 1/j, then there is a unique value of 
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B satisfying Equation (8.54) which is between 0 and 1.) The exact value of B 
usually can only be obtained by numerical methods. 

As zp is the limiting probability that an arrival sees k customers, it is just the 
a, as defined in Section 8.2. Hence, 


a, =(1— pp, k>0 (8.55) 


We can obtain W by conditioning on the number in the system when a customer 
arrives. This yields 


W= ye E[time in system | arrival sees k](1 — B) Be 


k 
le. k+1 (1 — gpk (Since if an arrival sees k then he spends 
~ P)p k +1 service periods in the system) 
Qs (s usin kx! = =) 
wl — p) ries La aye 
and 
B 
Wows =o.—2——. 
= Bw p(l— Bp) 
Xr 
L=i1W = ——_., 8.56 
u(1 — B) no 
Lo =AWo = AE 
oes? ul =B) 


where A is the reciprocal of the mean interarrival time. That is, 


1 lee) 
<-f x dG(x) 


In fact, in exactly the same manner as shown for the M/M/1 in Section 8.3.1 
and Exercise 4 we can show that 


W* is exponential with rate w(1 — B), 


we = 0 with probability 1 — 8 
Q exponential with rate (1 — ) with probability 8 


where W* and W6 are the amounts of time that a customer spends in system 
and queue, respectively (their means are W and Wo). 
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Whereas a, = (1 — B)B* is the probability that an arrival sees k in the system, 
it is not equal to the proportion of time during which there are k in the system 
(since the arrival process is not Poisson). To obtain the P, we first note that the 
rate at which the number in the system changes from k — 1 to k must equal the 
rate at which it changes from k to k— 1 (why?). Now the rate at which it changes 
from k — 1 to k is equal to the arrival rate A multiplied by the proportion of 
arrivals finding k — 1 in the system. That is, 


rate number in system goes from k — 1 to k = Aag_y 


Similarly, the rate at which the number in the system changes from k to k — 1 is 
equal to the proportion of time during which there are k in the system multiplied 
by the (constant) service rate. That is, 


rate number in system goes from k tok — 1 = Pzy 
Equating these rates yields 
Xr 
Pp = —ap_i, k>1 
iu 
and so, from Equation (8.55), 


pee =(l ~ppt!,  k>I1 


and, as Po = 1 — )°?°, Pp, we obtain 
Xr 


Py =1-— 
iv 


Remarks In the foregoing analysis we guessed at a solution of the stationary 
probabilities of the Markov chain of the form 2, = cf*, then verified such a 
solution by substituting in the stationary Equation (8.52). However, it could 
have been argued directly that the stationary probabilities of the Markov chain 
are of this form. To do so, define f; to be the expected number of times that state 
i + 1is visited in the Markov chain between two successive visits to state 1,1 > 0. 
Now it is not difficult to see (and we will let you argue it out for yourself) that 


fo = $i = fo =--- = 8B 


Now it can be shown by using renewal reward processes that 


E[number of visits to state i + 1 in an i-i cycle] 


Wj41 = ad 5 ae 
ee E[number of transitions in an i-i cycle] 


_ Bi 
~ 1/n; 
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and so, 
Tet = Pay = pigs i20 


implying, since )°>° 2; = 1, that 
=p(1-p), i20 


8.7.1 The G/M/1 Busy and Idle Periods 


Suppose that an arrival has just found the system empty—and so initiates a busy 
period—and let N denote the number of customers served in that busy period. 
Since the Nth arrival (after the initiator of the busy period) will also find the 
system empty, it follows that N is the number of transitions for the Markov chain 
(of Section 8.7) to go from state 0 to state 0. Hence, 1/E[N] is the proportion 
of transitions that take the Markov chain into state 0; or equivalently, it is the 
proportion of arrivals that find the system empty. Therefore, 


og 
a 1-8 

Also, as the next busy period begins after the Nth interarrival, it follows that the 
cycle time (that is, the sum of a busy and idle period) is equal to the time until 
the Nth interarrival. In other words, the sum of a busy and idle period can be 


expressed as the sum of N interarrival times. Thus, if T; is the ith interarrival 
time after the busy period begins, then 


E[Busy] + E[Idle] = ey n) 


= E[N]E[T] (by Wald’s equation) 
= 1 
~ A= B) 


For a second relation between E[Busy] and E[Idle], we can use the same argument 
as in Section 8.5.3 to conclude that 


E[Busy] 
E[Idle] + E[Busy] 


E[N] = 


(8.57) 


eae ne 


and since Py = 1 — A/p, we obtain, upon combining this with (8.57), that 


1 
E[Busy] = 


E[Idle] = 
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8.8 A Finite Source Model 


Consider a system of m machines, whose working times are independent expo- 
nential random variables with rate 4. Upon failure, a machine instantly goes to 
a repair facility that consists of a single repairperson. If the repairperson is free, 
repair begins on the machine; otherwise, the machine joins the queue of failed 
machines. When a machine is repaired it becomes a working machine, and repair 
begins on a new machine from the queue of failed machines (provided the queue is 
nonempty). The successive repair times are independent random variables having 
density function g, with mean 


MR = i. xg(x) dx 


To analyze this system, so as to determine such quantities as the average number 
of machines that are down and the average time that a machine is down, we will 
exploit the exponentially distributed working times to obtain a Markov chain. 
Specifically, let X,, denote the number of failed machines immediately after the 
nth repair occurs, n > 1. Now, if X, =i > 0, then the situation when the nth 
repair has just occurred is that repair is about to begin ona machine, there are i—1 
other machines waiting for repair, and there are m—i working machines, each of 
which will (independently) continue to work for an exponential time with rate i. 
Similarly, if X,, = 0, then all m machines are working and will (independently) 
continue to do so for exponentially distributed times with rate 1. Consequently, 
any information about earlier states of the system will not affect the probability 
distribution of the number of down machines at the moment of the next repair 
completion; hence, {X,, > 1} is a Markov chain. To determine its transition 
probabilities P;;, suppose first that i > 0. Conditioning on R, the length of the 
next repair time, and making use of the independence of the m — i remaining 
working times, yields that for j < m—i 


Pi j-14; = P{j failures during R} 


= / P{j failures during R | R = r}g(r) dr 
0 

= / ( ‘ya — eV (eI a(n) dr 
0 


If i = 0, then, because the next repair will not begin until one of the machines 
fails, 


Poy =Pij, fom—-1 
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Let 1,7 = 0,...,-—1, denote the stationary probabilities of this Markov chain. 
That is, they are the unique solution of 


mj =) mPiy, 
i 


Therefore, after explicitly determining the transition probabilities and solving the 
preceding equations, we would know the value of zo, the proportion of repair 
completions that leaves all machines working. Let us say that the system is “on” 
when all machines are working and “off” otherwise. (Thus, the system is on when 
the repairperson is idle and off when he is busy.) As all machines are working 
when the system goes back on, it follows from the lack of memory property of the 
exponential that the system probabilistically starts over when it goes on. Hence, 
this on-off system is an alternating renewal process. Suppose that the system has 
just become on, thus starting a new cycle, and let R;,i > 1, be the time of the 
ith repair from that moment. Also, let N denote the number of repairs in the off 
(busy) time of the cycle. Then, it follows that B, the length of the off period, can 
be expressed as 


Although N is not independent of the sequence Rj, R2,..., it is easy to check 
that it is a stopping time for this sequence, and thus by Wald’s equation (see 
Exercise 13 of Chapter 7) we have 


Also, since an on time will last until one of the machines fails, and since the 
minimum of independent exponential random variables is exponential with a 
rate equal to the sum of their rates, it follows that E[I], the mean on (idle) time 
in a cycle, is given by 


El] = 1/(md) 
Hence, Pg, the proportion of time that the repairperson is busy, satisfies 


ELN]uR 
E[N]ur + 1/(ma) 


Pz = 
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However, since, on average, one out of every E[N] repair completions will leave 
all machines working, it follows that 


1 
m= = 
o EIN] 
Consequently, 


g = ——k __ (8.58) 
LR + m0/(mA) 

Now focus attention on one of the machines, call it machine number 1, and 
let P1,r denote the proportion of time that machine 1 is being repaired. Since the 
proportion of time that the repairperson is busy is Pg, and since all machines fail 
at the same rate and have the same repair distribution, it follows that 


Pz LR 
Pir= = 8.59 
1 a se ae (8.59) 


However, machine 1 alternates between time periods when it is working, when 
it is waiting in queue, and when it is in repair. Let Wj, Oj, S; denote, respectively, 
the ith working time, the ith queueing time, and the ith repair time of machine 
1,i > 1. Then, the proportion of time that machine 1 is being repaired during its 
first n working—queue-repair cycles is as follows: 


proportion of time in the first 7 cycles that machine 1 is being repaired 
is vies 
ye Wet Oe ae 


= i Si/n 
Die Wi/n + Vin Qi/n + V4 Si/n 


Letting 7 — oo and using the strong law of large numbers to conclude that the 
averages of the W; and of the S; converge, respectively, to 1/4 and pp, yields 


LR 


2 = 
1/A+O+ wR 


where O is the average amount of time that machine 1 spends in queue when it 
fails. Using Equation (8.59), the preceding gives 


LR _ LR 
mUR+m/A 1/A+O+uR 
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or, equivalently, that 


O=(m—1)urR—(—270)/A 


Moreover, since all machines are probabilistically equivalent it follows that O is 
equal to Wo, the average amount of time that a failed machine spends in queue. 
To determine the average number of machines in queue, we will make use of the 
basic queueing identity 


Lo = daWo = daQ 


where A, is the average rate at which machines fail. To determine Ag, again 
focus attention on machine 1 and suppose that we earn one per unit time when- 
ever machine 1 is being repaired. It then follows from the basic cost identity of 
Equation (8.1) that 


PiR=MLR 


where rj is the average rate at which machine 1 fails. Thus, from Equation (8.59), 
we obtain 


1 
~ mur + 10/d 
Because all 7 machines fail at the same rate, the preceding implies that 


m 


Ag = mr = —————- 
MUR + 10/d 
which gives that the average number of machines in queue is 


m(m — 1)uR — m1 — 10)/r 
miLR + 10/A 


Lo= 


Since the average number of machines being repaired is Pg, the preceding, along 
with Equation (8.58), shows that the average number of down machines is 


mR —m(1 —10)/A 
MLR + 10/d 


8.9 Multiserver Queues 


By and large, systems that have more than one server are much more difficult 
to analyze than those with a single server. In Section 8.9.1 we start first with 
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a Poisson arrival system in which no queue is allowed, and then consider in 
Section 8.9.2 the infinite capacity M/M/k system. For both of these models we are 
able to present the limiting probabilities. In Section 8.9.3 we consider the model 
G/M/k. The analysis here is similar to that of the G/M/1 (Section 8.7) except 
that in place of a single quantity 6 given as the solution of an integral equation, 
we have k such quantities. We end in Section 8.9.4 with the model M/G/k for 
which unfortunately our previous technique (used in M/G/1) no longer enables 
us to derive Wo, and we content ourselves with an approximation. 


8.9.1  Erlang’s Loss System 


A loss system is a queueing system in which arrivals that find all servers busy 
do not enter but rather are lost to the system. The simplest such system is the 
M/M/k loss system in which customers arrive according to a Poisson process 
having rate 4, enter the system if at least one of the k servers is free, and then 
spend an exponential amount of time with rate w being served. The balance 
equations for this system are 


State Rate leave = rate enter 
0 APo = uP 
1 (A + w)Py = 2uP2 + rAPo 
2 (A + 2y)P2 = 3uP3 4+ APY 
i,0<i<k (A + ip)P; = G+ 1) Pin. + APi-1 
k kuPp = drPp_4 


Rewriting gives 


APo = uP, 
AP, = 2uP2, 
AP2 = 3uP3, 
APR_-1 = RuPp 
or 
Xr 
P, = —Po, 
L 
ri r/p)? 
pp (A/L) Po, 
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a (Q/m) 
P3 = —P) = P 
3 3) 2 31 0> 
i (A/p)* 
P, = —P,_,= P. 
k Ei k-1 ki 0 


and using a P; = 1, we obtain 


(A/m)! /i! 


ay ee eee eee, i=0,1,...,k 
Yij=0/ bY /7! 


Since E[S] = 1/u, where E[S] is the mean service time, the preceding can be 
written as 


(AE[S])‘/i! 


[SS i=0,1,...,k (8.60) 
Yo @ELSIi/j! 


Consider now the same system except that the service distribution is general— 
that is, consider the M/G/k with no queue allowed. This model is sometimes 
called the Erlang loss system. It can be shown (though the proof is advanced) 
that Equation (8.60) (which is called Erlang’s loss formula) remains valid for this 
more general system. 


8.9.2 The M/M/k Queue 


The M/M/k infinite capacity queue can be analyzed by the balance equation 
technique. We leave it for you to verify that 


(A/u)! 
k-1 ; i k - i<k 
(A/py  (A/p) Ul 
P; = Ds j oe = 
= i! ki kw-a 
ipk 
Lois i>k 


We see from the preceding that we need to impose the condition A < kw. 
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8.9.3 The G/M/k Queue 


In this model we again suppose that there are k servers, each of whom serves 
at an exponential rate 4. However, we now allow the time between successive 
arrivals to have an arbitrary distribution G. To ensure that a steady-state (or 
limiting) distribution exists, we assume the condition 1/4g < ku where pg is 
the mean of G.* 

The analysis for this model is similar to that presented in Section 8.7 for the 
case k = 1. Namely, to avoid having to keep track of the time since the last 
arrival, we look at the system only at arrival epochs. Once again, if we define X,, 
as the number in the system at the moment of the th arrival, then {X;,, 1 > 0} is 
a Markov chain. 

To derive the transition probabilities of the Markov chain, it helps to first note 
the relationship 


Xnyi = Xn t+1—- Yn, n>0 


where Y,, denotes the number of departures during the interarrival time between 
the mth and ( + 1)st arrival. The transition probabilities P;; can now be calcu- 
lated as follows: 


Case 1: j>i+1. 
In this case it easily follows that Pj = 0. 


Case 2: j<i+1<k. 

In this case if an arrival finds i in the system, then as i < k the new arrival will 
also immediately enter service. Hence, the next arrival will find j if of thei + 1 
services exactly i + 1—j are completed during the interarrival time. Conditioning 
on the length of this interarrival time yields 


Pi, = P{i+1-—j of i+ 1 services are completed in an interarrival time} 


lee) 
= / P{i+1—j of i+ 1 are completed|interarrival time is t} dG(t) 
0 
= / (’ z Je — eHtyit1-i(eHtyi dG(t) 
0 J 


where the last equality follows since the number of service completions in a time 
t will have a binomial distribution. 


* It follows from the renewal theory (Proposition 7.1) that customers arrive at rate 1/wcG, and as 
the maximum service rate is kj, we clearly need that 1/uwc¢ < ky for limiting probabilities to exist. 
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Case 3: i+ 1>j2k. 

To evaluate Pj in this case we first note that when all servers are busy, the 
departure process is a Poisson process with rate kj (why?). Hence, again condi- 
tioning on the interarrival time we have 


P;; = P{i + 1 —j departures} 


lee) 
= / P{i + 1 —j departures in time t} dG(t) 
0 


= —kut (kut)!* 
=f. e G+1-)l ace) 


Case 4: i+1>k>)j. 

In this case since when all servers are busy the departure process is a Poisson 
process, it follows that the length of time until there will only be k in the system 
will have a gamma distribution with parameters i + 1 — k, kw (the time until 
i+ 1-—k events of a Poisson process with rate kjz occur is gamma distributed 
with parameters i + 1 — k, kw). Conditioning first on the interarrival time and 
then on the time until there are only k in the system (call this latter random 
variable T;) yields 


lee) 
Py = i P{i + 1 —j departures in time t} dG(¢) 
0 


(oe) t 
a i P{i + 1—j departures in t | T, = s}kwenkus RES) jus) 
0 Jo G= = 


lee) t k oe : (k s)i- 
=s = p(t—s) k-j p(t—s)\J kus M 
=f i (;)¢ : ye ie G-B! ds oe 


where the last equality follows since of the k people in service at time s the number 
whose service will end by time ¢ is binomial with parameters k and 1 — e~#“—), 

We now can verify either by a direct substitution into the equations 2; = 
>=; tiPi;, or by the same argument as presented in the remark at the end 
of Section 8.7, that the limiting probabilities of this Markov chain are of 
the form 


ae dG(t) 


Th-14j = Cb, j=0,1,.... 


Substitution into any of the equations 2; = )0; 7;P;; when j > k yields that £ is 
given as the solution of 


p= ip * okt aG@) 
0 


The values zp, 771,..., 2 can be obtained by recursively solving the first k — 1 
of the steady-state equations, and c can then be computed by using )°>° 7; = 1. 
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If we let WG denote the amount of time that a customer spends in queue, then 
in exactly the same manner as in G/M/1 we can show that 


0, with probability x m=1- Ss 


* 


O= Exp(ku(1 — B)), with probability °?° 7; = tS 


where Exp(ky(1 — B)) is an exponential random variable with rate ku(1 — B). 


8.9.4 The M/G/k Queue 


In this section we consider the M/G/k system in which customers arrive at a 
Poisson rate d. and are served by any of k servers, each of whom has the service 
distribution G. If we attempt to mimic the analysis presented in Section 8.5 for 
the M/G/1 system, then we would start with the basic identity 


V = AE[S]Wo + AE[S7]/2 (8.61) 


and then attempt to derive a second equation relating V and Wo. 
Now if we consider an arbitrary arrival, then we have the following identity: 


work in system when customer arrives 


=k x time customer spends in queue + R (8.62) 


where R is the sum of the remaining service times of all other customers in service 
at the moment when our arrival enters service. 

The foregoing follows because while the arrival is waiting in queue, work is 
being processed at a rate k per unit time (since all servers are busy). Thus, an 
amount of work k x time in queue is processed while he waits in queue. Now, all 
of this work was present when he arrived and in addition the remaining work on 
those still being served when he enters service was also present when he arrived— 
so we obtain Equation (8.62). For an illustration, suppose that there are three 
servers all of whom are busy when the customer arrives. Suppose, in addition, 
that there are no other customers in the system and also that the remaining service 
times of the three people in service are 3, 6, and 7. Hence, the work seen by the 
arrival is3 + 6 + 7 = 16. Now the arrival will spend 3 time units in queue, and at 
the moment he enters service, the remaining times of the other two customers are 
6—3 =3and7—3 = 4. Hence, R = 3 + 4 =7 andasa check of Equation (8.62) 
we see that 16 =3 x 347. 

Taking expectations of Equation (8.62) and using the fact that Poisson arrivals 
see time averages, we obtain 


V =kWo + EIR] 


which, along with Equation (8.61), would enable us to solve for Wo if we could 
compute E[R]. However there is no known method for computing E[R] and in 
fact, there is no known exact formula for Wo. The following approximation 
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for Wo was obtained in Reference 6 by using the foregoing approach and then 
approximating E[R]: 


ARE[S*](E[S])*&~1 


Wo 
k-1 n k 
(AE[S]) (AE[S]) 
2(k — 1k — AE[S])2 dX = Ea DE - AEDS) 


(8.63) 


The preceding approximation has been shown to be quite close to Wo when the 
service distribution is gamma. It is also exact when G is exponential. 


Exercises 


1. For the M/M/1 queue, compute 
(a) the expected number of arrivals during a service period and 
(b) the probability that no customers arrive during a service period. 


Hint: “Condition.” 


*2. Machines in a factory break down at an exponential rate of six per hour. There is a 
single repairman who fixes machines at an exponential rate of eight per hour. The 
cost incurred in lost production when machines are out of service is $10 per hour 
per machine. What is the average cost rate incurred due to failed machines? 


3. The manager of a market can hire either Mary or Alice. Mary, who gives service 
at an exponential rate of 20 customers per hour, can be hired at a rate of $3 per 
hour. Alice, who gives service at an exponential rate of 30 customers per hour, can 
be hired at a rate of $C per hour. The manager estimates that, on the average, each 
customer’s time is worth $1 per hour and should be accounted for in the model. 
Assume customers arrive at a Poisson rate of 10 per hour 

(a) What is the average cost per hour if Mary is hired? If Alice is hired? 

(b) Find C if the average cost per hour is the same for Mary and Alice. 


4. Suppose that a customer of the M/M/1 system spends the amount of time x > 0 
waiting in queue before entering service. 

(a) Show that, conditional on the preceding, the number of other customers that 

were in the system when the customer arrived is distributed as 1 + P, where 

P is a Poisson random variable with mean i. 

(b) Let W6 denote the amount of time that an M/M/1 customer spends in queue. 


As a by-product of your analysis in part (a), show that 


oT ere A if 0 
SX 
2 1- 4 - +(1 — e H-A)) ifx > 0 


5. It follows from Exercise 4 that if, in the M/M/1 model, Wo is the amount of time 
that a customer spends waiting in queue, then 
a fe with probability 1 —A/y 


WA = 
Q Exp(u — A), with probability A/ju 
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ee 


10. 


11. 


where Exp(z — 4) is an exponential random variable with rate 4 — 2. Using this, 
find Var( Wo). 


Suppose we want to find the covariance between the times spent in the system by 

the first two customers in an M/M/1 queueing system. To obtain this covariance, 

let S; be the service time of customer i, i = 1,2, and let Y be the time between the 

two arrivals. 

(a) Argue that (Sj — Y)* + Sz is the amount of time that customer 2 spends in 
the system, where xt = max(x, 0). 

(b) Find Cov(Sy, (S1 — Y)* + $2). 


Hint: Compute both E[(S — Y)*] and E[S,(S; — Y)*] by conditioning on whether 
Si > Y. 


Show that W is smaller in an M/M/1 model having arrivals at rate 4 and service 
at rate 2 than it is in a two-server M/M/2 model with arrivals at rate 2 and with 
each server at rate 4. Can you give an intuitive explanation for this result? Would 
it also be true for Wo? 


A facility produces items according to a Poisson process with rate 4. However, it 
has shelf space for only k items and so it shuts down production whenever k items 
are present. Customers arrive at the facility according to a Poisson process with 
rate jz. Each customer wants one item and will immediately depart either with the 
item or empty handed if there is no item available. 

(a) Find the proportion of customers that go away empty handed. 

(b) Find the average time that an item is on the shelf. 

(c) Find the average number of items on the shelf. 


A group of 7 customers moves around among two servers. Upon completion of 
service, the served customer then joins the queue (or enters service if the server is 
free) at the other server. All service times are exponential with rate jx. Find the 
proportion of time that there are j customers at server 1,7 =0,...,7. 


A group of m customers frequents a single-server station in the following manner. 
When a customer arrives, he or she either enters service if the server is free or joins 
the queue otherwise. Upon completing service the customer departs the system, but 
then returns after an exponential time with rate 0. All service times are exponentially 
distributed with rate ju. 

(a) Find the average rate at which customers enter the station. 

(b) Find the average time that a customer spends in the station per visit. 


Consider a single-server queue with Poisson arrivals and exponential service times 

having the following variation: Whenever a service is completed a departure occurs 

only with probability a. With probability 1 — a the customer, instead of leaving, 

joins the end of the queue. Note that a customer may be serviced more than once. 

(a) Set up the balance equations and solve for the steady-state probabilities, stating 
conditions for it to exist. 

(b) Find the expected waiting time of a customer from the time he arrives until he 
enters service for the first time. 

(c) What is the probability that a customer enters service exactly 1 times, 7 = 
1,2, cet 

(d) What is the expected amount of time that a customer spends in service (which 
does not include the time he spends waiting in line)? 
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*12. 


13. 


14. 


1S. 


Hint: Use part (c). 


(e) What is the distribution of the total length of time a customer spends being 
served? 


Hint: Is it memoryless? 


A supermarket has two exponential checkout counters, each operating at rate pw. 

Arrivals are Poisson at rate 4. The counters operate in the following way: 

(i) One queue feeds both counters. 

(ii) One counter is operated by a permanent checker and the other by a stock 
clerk who instantaneously begins checking whenever there are two or more 
customers in the system. The clerk returns to stocking whenever he completes 
a service, and there are fewer than two customers in the system. 

(a) Find P,, proportion of time there are 7 in the system. 

(b) At what rate does the number in the system go from 0 to 1? From 2 to 1? 

(c) What proportion of time is the stock clerk checking? 


Hint: Be a little careful when there is one in the system. 


Two customers move about among three servers. Upon completion of service at 
server i, the customer leaves that server and enters service at whichever of the other 
two servers is free. (Therefore, there are always two busy servers.) If the service 
times at server i are exponential with rate 4;, i = 1,2,3, what proportion of time 
is server i idle? 


Consider a queueing system having two servers and no queue. There are two types 
of customers. Type 1 customers arrive according to a Poisson process having rate A1, 
and will enter the system if either server is free. The service time of a type 1 customer 
is exponential with rate 41. Type 2 customers arrive according to a Poisson process 
having rate A. A type 2 customer requires the simultaneous use of both servers; 
hence, a type 2 arrival will only enter the system if both servers are free. The time 
that it takes (the two servers) to serve a type 2 customer is exponential with rate 2. 
Once a service is completed on a customer, that customer departs the system. 

(a) Define states to analyze the preceding model. 

(b) Give the balance equations. 

In terms of the solution of the balance equations, find 

(c) the average amount of time an entering customer spends in the system; 

(d) the fraction of served customers that are type 1. 


Consider a sequential-service system consisting of two servers, A and B. Arriving 

customers will enter this system only if server A is free. If a customer does enter, 

then he is immediately served by server A. When his service by A is completed, he 

then goes to B if B is free, or if B is busy, he leaves the system. Upon completion of 

service at server B, the customer departs. Assume that the (Poisson) arrival rate is 

two customers an hour, and that A and B serve at respective (exponential) rates of 

four and two customers an hour. 

(a) What proportion of customers enter the system? 

) What proportion of entering customers receive service from B? 

) What is the average number of customers in the system? 

) What is the average amount of time that an entering customer spends in the 
system? 
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19. 


Customers arrive at a two-server system according to a Poisson process having rate 
4 = 5. An arrival finding server 1 free will begin service with that server. An arrival 
finding server 1 busy and server 2 free will enter service with server 2. An arrival 
finding both servers busy goes away. Once a customer is served by either server, 
he departs the system. The service times at server i are exponential with rates ;, 
where 1 = 4, 2 = 2. 

(a) What is the average time an entering customer spends in the system? 

(b) What proportion of time is server 2 busy? 


Customers arrive at a two-server station in accordance with a Poisson process with 

a rate of two per hour. Arrivals finding server 1 free begin service with that server. 

Arrivals finding server 1 busy and server 2 free begin service with server 2. Arrivals 

finding both servers busy are lost. When a customer is served by server 1, she then 

either enters service with server 2 if 2 is free or departs the system if 2 is busy. A 

customer completing service at server 2 departs the system. The service times at 

server 1 and server 2 are exponential random variables with respective rates of four 

and six per hour. 

(a) What fraction of customers do not enter the system? 

(b) What is the average amount of time that an entering customer spends in the 
system? 

(c) What fraction of entering customers receives service from server 1? 


Arrivals to a three-server system are according to a Poisson process with rate A. 

Arrivals finding server 1 free enter service with 1. Arrivals finding 1 busy but 2 free 

enter service with 2. Arrivals finding both 1 and 2 busy do not join the system. After 

completion of service at either 1 or 2 the customer will then either go to server 3 if 

3 is free or depart the system if 3 is busy. After service at 3 customers depart the 

system. The service times at i are exponential with rate j,i = 1,2, 3. 

(a) Define states to analyze the above system. 

(b) Give the balance equations. 

(c) In terms of the solution of the balance equations, what is the average time that 
an entering customer spends in the system? 

(d) Find the probability that a customer who arrives when the system is empty is 
served by server 3. 


The economy alternates between good and bad periods. During good times cus- 

tomers arrive at a certain single-server queueing system in accordance with a Pois- 

son process with rate 41, and during bad times they arrive in accordance with a 

Poisson process with rate 42. A good time period lasts for an exponentially dis- 

tributed time with rate a1, and a bad time period lasts for an exponential time with 

rate a2. An arriving customer will only enter the queueing system if the server is 

free; an arrival finding the server busy goes away. All service times are exponential 

with rate jw. 

(a) Define states so as to be able to analyze this system. 

(b) Give a set of linear equations whose solution will yield the long-run proportion 
of time the system is in each state. 

In terms of the solutions of the equations in part (b), 

(c) what proportion of time is the system empty? 

(d) what is the average rate at which customers enter the system? 


572 


Queueing Theory 


20. 


22. 


23. 


There are two types of customers. Type 1 and 2 customers arrive in accordance with 
independent Poisson processes with respective rate 4, and 2. There are two servers. 
A type 1 arrival will enter service with server 1 if that server is free; if server 1 is 
busy and server 2 is free, then the type 1 arrival will enter service with server 
2. If both servers are busy, then the type 1 arrival will go away. A type 2 cus- 
tomer can only be served by server 2; if server 2 is free when a type 2 customer 
arrives, then the customer enters service with that server. If server 2 is busy when a 
type 2 arrives, then that customer goes away. Once a customer is served by either 
server, he departs the system. Service times at server i are exponential with rate j;, 
i= 1,2. 

Suppose we want to find the average number of customers in the system. 
(a) Define states. 
(b) Give the balance equations. Do not attempt to solve them. 
In terms of the long-run probabilities, what is 
(c) the average number of customers in the system? 
(d) the average time a customer spends in the system? 


Suppose in Exercise 20 we want to find out the proportion of time there is a type 1 
customer with server 2. In terms of the long-run probabilities given in Exercise 20, 
what is 

(a) the rate at which a type 1 customer enters service with server 2? 

(b) the rate at which a type 2 customer enters service with server 2? 

(c) the fraction of server 2’s customers that are type 1? 

(d) the proportion of time that a type 1 customer is with server 2? 


Customers arrive at a single-server station in accordance with a Poisson process 
with rate 2. All arrivals that find the server free immediately enter service. All service 
times are exponentially distributed with rate jw. An arrival that finds the server busy 
will leave the system and roam around “in orbit” for an exponential time with rate 
6 at which time it will then return. If the server is busy when an orbiting customer 
returns, then that customer returns to orbit for another exponential time with rate 0 
before returning again. An arrival that finds the server busy and N other customers 
in orbit will depart and not return. That is, N is the maximum number of customers 
in orbit. 

(a) Define states. 

(b) Give the balance equations. 

In terms of the solution of the balance equations, find 

(c) the proportion of all customers that are eventually served; 

(d) the average time that a served customer spends waiting in orbit. 


Consider the M/M/1 system in which customers arrive at rate A and the server 

serves at rate «. However, suppose that in any interval of length h in which the 

server is busy there is a probability wh + o(h) that the server will experience a 

breakdown, which causes the system to shut down. All customers that are in the 

system depart, and no additional arrivals are allowed to enter until the breakdown 

is fixed. The time to fix a breakdown is exponentially distributed with rate B. 

(a) Define appropriate states. 

(b) Give the balance equations. 

In terms of the long-run probabilities, 

(c) what is the average amount of time that an entering customer spends in the 
system? 
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26. 


27. 
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(d) what proportion of entering customers complete their service? 
(e) what proportion of customers arrive during a breakdown? 


Reconsider Exercise 23, but this time suppose that a customer that is in the system 
when a breakdown occurs remains there while the server is being fixed. In addition, 
suppose that new arrivals during a breakdown period are allowed to enter the 
system. What is the average time a customer spends in the system? 


Poisson (A) arrivals join a queue in front of two parallel servers A and B, having 

exponential service rates 4 and wp (see Figure 8.4). When the system is empty, 

arrivals go into server A with probability a and into B with probability 1 — a. 

Otherwise, the head of the queue takes the first free server. 

(a) Define states and set up the balance equations. Do not solve. 

(b) In terms of the probabilities in part (a), what is the average number in the 
system? Average number of servers idle? 

(c) Interms of the probabilities in part (a), what is the probability that an arbitrary 
arrival will get serviced in A? 


In a queue with unlimited waiting space, arrivals are Poisson (parameter 4) and 

service times are exponentially distributed (parameter jz). However, the server waits 

until K people are present before beginning service on the first customer; thereafter, 

he services one at a time until all K units, and all subsequent arrivals, are serviced. 

The server is then “idle” until K new arrivals have occurred. 

(a) Define an appropriate state space, draw the transition diagram, and set up the 
balance equations. 

(b) In terms of the limiting probabilities, what is the average time a customer 
spends in queue? 

(c) What conditions on A and yw are necessary? 


Consider a single-server exponential system in which ordinary customers arrive at 
a rate A and have service rate «4. In addition, there is a special customer who has 
a service rate 443. Whenever this special customer arrives, she goes directly into 
service (if anyone else is in service, then this person is bumped back into queue). 
When the special customer is not being serviced, she spends an exponential amount 
of time (with mean 1/6) out of the system. 

(a) What is the average arrival rate of the special customer? 

(b) Define an appropriate state space and set up balance equations. 

(c) Find the probability that an ordinary customer is bumped 7 times. 


Let D denote the time between successive departures in a stationary M/M/1 queue 
with 4 < ys. Show, by conditioning on whether or not a departure has left the 
system empty, that D is exponential with rate . 


Hint: By conditioning on whether or not the departure has left the system empty 
we see that 
Exponential(), with probability 4/ 


Exponential(A) « Exponential(j), with probability 1 —A/u 
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30. 


31. 


32. 


33. 


where Exponential(A) * Exponential(w) represents the sum of two independent 
exponential random variables having rates 4 and 4. Now use moment-generating 
functions to show that D has the required distribution. 


Note that the preceding does not prove that the departure process is Poisson. To 
prove this we need show not only that the interdeparture times are all exponential 
with rate A, but also that they are independent. 


Potential customers arrive to a single-server hair salon according to a Poisson pro- 

cess with rate 4. A potential customer who finds the server free enters the system; a 

potential customer who finds the server busy goes away. Each potential customer 

is type i with probability pj, where p1 + po + p3 = 1. Type 1 customers have 

their hair washed by the server; type 2 customers have their hair cut by the server; 

and type 3 customers have their hair first washed and then cut by the server. The 

time that it takes the server to wash hair is exponentially distributed with rate w1, 

and the time that it takes the server to cut hair is exponentially distributed with 

rate [12. 

(a) Explain how this system can be analyzed with four states. 

(b) Give the equations whose solution yields the proportion of time the system is 
in each state. 

In terms of the solution of the equations of (b), find 

(c) the proportion of time the server is cutting hair; 

(d) the average arrival rate of entering customers. 


For the tandem queue model verify that 


Prym = (A/ma1)" (1 — 0/1) (A/ m2)" (1 = 2/12) 


satisfies the balance equation (8.15). 


Consider a network of three stations with a single server at each station. Customers 
arrive at stations 1, 2, 3 in accordance with Poisson processes having respective rates 
5,10, and 15. The service times at the three stations are exponential with respective 
rates 10, 50, and 100. A customer completing service at station 1 is equally likely to 
(i) go to station 2, (ii) go to station 3, or (iii) leave the system. A customer departing 
service at station 2 always goes to station 3. A departure from service at station 3 
is equally likely to either go to station 2 or leave the system. 

(a) What is the average number of customers in the system (consisting of all three 

stations)? 
(b) What is the average time a customer spends in the system? 


Consider a closed queueing network consisting of two customers moving among 
two servers, and suppose that after each service completion the customer is equally 
likely to go to either server—that is, Pj. = P24 =f Let j1; denote the exponential 
service rate at server 7,7 = 1,2. 

(a) Determine the average number of customers at each server. 

(b) Determine the service completion rate for each server. 


Explain how a Markov chain Monte Carlo simulation using the Gibbs sampler can 
be utilized to estimate 
(a) the distribution of the amount of time spent at server j on a visit. 


Hint: Use the arrival theorem. 
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(b) the proportion of time a customer is with server j (i.e., either in server j’s queue 
or in service with /). 


For open queueing networks 

(a) state and prove the equivalent of the arrival theorem; 

(b) derive an expression for the average amount of time a customer spends waiting 
in queues. 


Customers arrive at a single-server station in accordance with a Poisson process 
having rate 4. Each customer has a value. The successive values of customers are 
independent and come from a uniform distribution on (0, 1). The service time of a 
customer having value x is a random variable with mean 3 + 4x and variance 5. 
(a) What is the average time a customer spends in the system? 

(b) What is the average time a customer having value x spends in the system? 


Compare the M/G/1 system for first-come, first-served queue discipline with one 
of last-come, first-served (for instance, in which units for service are taken from 
the top of a stack). Would you think that the queue size, waiting time, and busy- 
period distribution differ? What about their means? What if the queue discipline 
was always to choose at random among those waiting? Intuitively, which discipline 
would result in the smallest variance in the waiting time distribution? 

In an M/G/1 queue, 

(a) what proportion of departures leave behind 0 work? 

(b) what is the average work in the system as seen by a departure? 

For the M/G/1 queue, let X,, denote the number in the system left behind by the 
nth departure. 

(a) If 


Katey, ifxS4 
acti (2 if X, —0 


what does Y,, represent? 
(b) Rewrite the preceding as 


Xn41 = Xyn—-14+ Yn + bn (8.64) 
where 
1, if X, =0 
on _ , 
0, if X, >1 


Take expectations and let 7 > oo in Equation (8.64) to obtain 
E[So0] = 1 — AE[S] 
(c) Square both sides of Equation (8.64), take expectations, and then let 7 > oo 
to obtain 
7 E[S?] 
E[Xx] = == 
[Xoo] 31 — 0EIS) + AES] 


(d) Argue that E[Xoo], the average number as seen by a departure, is equal to L. 
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Consider an M/G/1 system in which the first customer in a busy period has the 
service distribution G; and all others have distribution Gy. Let C denote the number 
of customers in a busy period, and let S denote the service time of a customer chosen 
at random. 

Argue that 
a) a9 =P) =1-—AE[S]. 
(b) E[S] = apE[S1] + (1 — ao) E[S2] where S; has distribution G;. 
(c) Use (a) and (b) to show that E[B], the expected length of a busy period, is 

given by 


E[B] = E[S;] 
1 REIS 

(d) Find E[C]. 

Consider a M/G/1 system with AE[S] < 1. 

(a) Suppose that service is about to begin at a moment when there are 7 customers 
in the system. 

(i) Argue that the additional time until there are only 7 — 1 customers in the 
system has the same distribution as a busy period. 
(ii) What is the expected additional time until the system is empty? 

(b) Suppose that the work in the system at some moment is A. We are interested 
in the expected additional time until the system is empty—call it E[T]. Let N 
denote the number of arrivals during the first A units of time. 

(i) Compute E[T|N]. 
(ii) Compute E[T]. 

Carloads of customers arrive at a single-server station in accordance with a Poisson 

process with rate 4 per hour. The service times are exponentially distributed with 

rate 20 per hour. If each carload contains either 1, 2, or 3 customers with respective 
probabilities tj o and a compute the average customer delay in queue. 


In the two-class priority queueing model of Section 8.6.2, what is Wo? Show that 
Wg is less than it would be under FIFO if E[S;] < E[S2] and greater than under 
FIFO if E[S;] > E[S2]. 

In a two-class priority queueing model suppose that a cost of C; per unit time is 
incurred for each type i customer that waits in queue, i = 1,2. Show that type 1 
customers should be given priority over type 2 (as opposed to the reverse) if 


E{S,] 7 E[S2] 
Cy C2 


Consider the priority queueing model of Section 8.6.2 but now suppose that if a type 
2 customer is being served when a type 1 arrives then the type 2 customer is bumped 
out of service. This is called the preemptive case. Suppose that when a bumped type 
2 customer goes back in service his service begins at the point where it left off when 
he was bumped. 
(a) Argue that the work in the system at any time is the same as in the non- 
preemptive case. 
(b) Derive Wo. 
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Hint: How do type 2 customers affect type 1s? 
(c) Why is it not true that 


V6 = 2E[S21WG 


(d) Argue that the work seen by a type 2 arrival is the same as in the nonpreemptive 
case, and so 


Wo a W6 (nonpreemptive) + E[extra time] 


where the extra time is due to the fact that he may be bumped. 
(e) Let N denote the number of times a type 2 customer is bumped. Why is 


NE[S1] 


Efextra time|N] = ————_ 
[extra time|N] 1 EIS 


Hint: When a type 2 is bumped, relate the time until he gets back in service 
to a “busy period.” 

(f) Let $2 denote the service time of a type 2. What is E[N|S2]? 

(g) Combine the preceding to obtain 


AL ELS JETS? | 


Do. sae: . 
Wo = Wo (nonpreemptive) + 1 — 44 EISy] 


Calculate explicitly (not in terms of limiting probabilities) the average time a cus- 
tomer spends in the system in Exercise 24. 

In the G/M/1 model if G is exponential with rate 2 show that B = A/j. 

Verify Erlang’s loss formula, Equation (8.60), when k = 1. 

Verify the formula given for the P; of the M/M/k. 

In the Erlang loss system suppose the Poisson arrival rate is A = 2, and suppose 
there are three servers, each of whom has a service distribution that is uniformly 
distributed over (0,2). What proportion of potential customers is lost? 

In the M/M/k system, 

(a) what is the probability that a customer will have to wait in queue? 

(b) determine L and W. 

Verify the formula for the distribution of W* given for the G/M/k model. 
Consider a system where the interarrival times have an arbitrary distribution F, and 
there is a single server whose service distribution is G. Let D,, denote the amount 
of time the nth customer spends waiting in queue. Interpret S,,,T, so that 


he Dn+Sp—Tn, if Da + Sn— Ty > 0 
eam Fo ih Dae Te 

Consider a model in which the interarrival times have an arbitrary distribution F, 

and there are k servers each having service distribution G. What condition on F 

and G do you think would be necessary for there to exist limiting probabilities? 
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9.1 Introduction 


Reliability theory is concerned with determining the probability that a system, 
possibly consisting of many components, will function. We shall suppose that 
whether or not the system functions is determined solely from a knowledge of 
which components are functioning. For instance, a series system will function 
if and only if all of its components are functioning, while a parallel system will 
function if and only if at least one of its components is functioning. In Section 9.2, 
we explore the possible ways in which the functioning of the system may depend 
upon the functioning of its components. In Section 9.3, we suppose that each 
component will function with some known probability (independently of each 
other) and show how to obtain the probability that the system will function. 
As this probability often is difficult to explicitly compute, we also present use- 
ful upper and lower bounds in Section 9.4. In Section 9.5 we look at a system 
dynamically over time by supposing that each component initially functions and 
does so for a random length of time at which it fails. We then discuss the relation- 
ship between the distribution of the amount of time that a system functions and 
the distributions of the component lifetimes. In particular, it turns out that if the 
amount of time that a component functions has an increasing failure rate on the 
average (IFRA) distribution, then so does the distribution of system lifetime. In 
Section 9.6 we consider the problem of obtaining the mean lifetime of a system. 
In the final section we analyze the system when failed components are subjected 
to repair. 
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9.2 Structure Functions 


Consider a system consisting of 7 components, and suppose that each component 
is either functioning or has failed. To indicate whether or not the ith component 
is functioning, we define the indicator variable x; by 


x alb if the ith component is functioning 
‘ 0, if the ith component has failed 


The vector x = (x1,...,X) is called the state vector. It indicates which of the 
components are functioning and which have failed. 

We further suppose that whether or not the system as a whole is functioning is 
completely determined by the state vector x. Specifically, it is supposed that there 
exists a function #(x) such that 


see 1, if the system is functioning when the state vector is x 
~ 10, if the system has failed when the state vector is x 


The function (x) is called the structure function of the system. 


Example 9.1 (The Series Structure) A series system functions if and only if all 
of its components are functioning. Hence, its structure function is given by 


G(x) = min(w1,...,%») =] [xi 
i=1 


We shall find it useful to represent the structure of a system in terms of a diagram. 
The relevant diagram for the series structure is shown in Figure 9.1. The idea is 
that if a signal is initiated at the left end of the diagram then in order for it 
to successfully reach the right end, it must pass through all of the components; 
hence, they must all be functioning. a 


6 
ho ¢ 
3 


Figure 9.1 A series system. 


Example 9.2 (The Parallel Structure) A parallel system functions if and only 
if at least one of its components is functioning. Hence, its structure function is 
given by 


(x) = max(x1,..., Xn) 
A parallel structure may be pictorially illustrated by Figure 9.2. This follows since 


a signal at the left end can successfully reach the right end as long as at least one 
component is functioning. a 
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Figure 9.2 A parallel system. 


Example 9.3 (The k-out-of-1 Structure) The series and parallel systems are 
both special cases of a k-out-of-n system. Such a system functions if and only 
if at least k of the components are functioning. As }7j_, x; equals the num- 
ber of functioning components, the structure function of a k-out-of-n system is 
given by 


Series and parallel systems are respectively n-out-of-1 and 1-out-of-n systems. 
The two-out-of-three system may be diagrammed as shown in Figure 9.3. ll 


1 2 
2 3 
1 3 


Figure 9.3. A two-out-of-three system. 


Example 9.4 (A Four-Component Structure) Consider a system consisting of four 
components, and suppose that the system functions if and only if components 1 
and 2 both function and at least one of components 3 and 4 function. Its structure 
function is given by 


(x) = x4x2 Max(x3, x4) 
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Pictorially, the system is as shown in Figure 9.4. A useful identity, easily checked, 
is that for binary variables,* xj,i=1,...,7, 


n 
max(x1,...,X,) =1-— [[a — xj) 
i=1 


3 
1 2 
4 
bz 


Figure 9.4 


When n = 2, this yields 
max(x1,x2) =1—(1—2x1,)( — x2) = x1 + x2 — 1x2 
Hence, the structure function in the example may be written as 
P(X) = x1x2(x3 + x4 — 3X4) a 


It is natural to assume that replacing a failed component by a functioning one 
will never lead to a deterioration of the system. In other words, it is natural to 
assume that the structure function $(x) is an increasing function of x, that is, if 
xi <yji,i=1,...,2, then O(x) < ¢(y). Such an assumption shall be made in this 
chapter and the system will be called monotone. 


9.2.1 Minimal Path and Minimal Cut Sets 


In this section we show how any system can be represented both as a series 
arrangement of parallel structures and as a parallel arrangement of series struc- 
tures. As a preliminary, we need the following concepts. 

A state vector x is called a path vector if ¢(x) = 1. If, in addition, d(y) = 0 
for all y < x, then x is said to be a minimal path vector.** If x is a minimal path 
vector, then the set A = {i: x; = 1} is called a minimal path set. In other words, 
a minimal path set is a minimal set of components whose functioning ensures the 
functioning of the system. 


* A binary variable is one that assumes either the value 0 or 1. 
* We say that y < xif yj < xj,i=1,...,”, with y; < x; for some i. 
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Example 9.5 Consider a five-component system whose structure is illustrated 
by Figure 9.5. Its structure function equals 


(x) = max(x1, x2) max(x3x4, x5) 
= (x1 + x2 — X1x2)(X3X4 + X5 — XZX4X5) 


There are four minimal path sets, namely, {1, 3, 4}, {2, 3, 4}, {1, 5}, {2, 5}. | 


1 3 4 
2 5 


Figure 9.5 


Example 9.6 Ina k-out-of-n system, there are (/) minimal path sets, namely, all 
of the sets consisting of exactly k components. a 


Let Aj,..., As denote the minimal path sets of a given system. We define a;(x), 
the indicator function of the jth minimal path set, by 


«%) 1, if all the components of A; are functioning 
a(x) = 
: 0, otherwise 


By definition, it follows that the system will function if all the components of at 
least one minimal path set are functioning; that is, if aj(x) = 1 for some j. On 
the other hand, if the system functions, then the set of functioning components 
must include a minimal path set. Therefore, a system will function if and only if 
all the components of at least one minimal path set are functioning. Hence, 


1, if a(x) = 1 for some j 


a 0 if a(x) = 0 for all j 
or equivalently, 
p(x) = max a(x) 
j 


= max | | x (9.1) 
J 


icAj 
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Since a(x) is a series structure function of the components of the jth minimal 
path set, Equation (9.1) expresses an arbitrary system as a parallel arrangement 
of series systems. 


Example 9.7. Consider the system of Example 9.5. Because its minimal path sets 
are Ay = {1,3,4}, Ar = {2,3,4}, A3 = {1,5}, and Aq = {2,5}, we have by 
Equation (9.1) that 


P(X) = Max{x1x3x4, XIXZX4, X1X5, XIXS} 
=1-— (0 — x1%3x4)(1 — x2%3x4)(1 — x1%5)(1 — x25) 


ae 
eis 

aaa 
ae: 


Figure 9.6 


You should verify that this equals the value of (x) given in Example 9.5. (Make 
use of the fact that, since x; equals 0 or 1, x? = xj.) This representation may be 
pictured as shown in Figure 9.6. 


Figure 9.7. The bridge system. 


Example 9.8 The system whose structure is as pictured in Figure 9.7 is called 
the bridge system. Its minimal path sets are {1, 4}, {1, 3, 5}, {2, 5}, and {2, 3, 4}. 
Hence, by Equation (9.1), its structure function may be expressed as 


P(X) = max{x1X4, X1X3X5, XIX5, XIX3X4} 


=1-—( —x1%4)(1 — x4%3x5)(1 — x2%5)(1 — x2%3%4) 


This representation (x) is diagrammed as shown in Figure 9.8. a 


A state vector x is called a cut vector if ¢(x) = 0. If, in addition, 6(y) = 1 for all 
y > x, then x is said to be a minimal cut vector. If x is a minimal cut vector, then 
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Figure 9.8 


the set C = {i: x; = 0} is called a minimal cut set. In other words, a minimal 
cut set is a minimal set of components whose failure ensures the failure of the 
system. 

Let Ci,...,C, denote the minimal cut sets of a given system. We define ;(x), 
the indicator function of the jth minimal cut set, by 


1, if at least one component of the jth minimal 
cut set is functioning 


Bj(x) 


0, if all of the components of the jth minimal 
cut set are not functioning 
= max x; 
ieC; 


Since a system is not functioning if and only if all the components of at least one 
minimal cut set are not functioning, it follows that 


k k 
o(x) = I] B(x) = | [ max xi (9.2) 
fA =i ieC; 


Since 6; (x) is a parallel structure function of the components of the jth minimal 
cut set, Equation (9.2) represents an arbitrary system as a series arrangement of 
parallel systems. 


Example 9.9 The minimal cut sets of the bridge structure shown in Figure 9.9 
are {1, 2}, {1, 3, 5}, {2, 3, 4}, and {4, 5}. Hence, from Equation (9.2), we may 
express o(x) by 


(x) = max(x1, x2) max(x1, x3, x5) Max(x2, x3, x4) Max(x4, x5) 
=[1—-  —x4)0 — x2)][1 — d — x1) — x3) — x5)] 
x [1- (1 — x2)(1 — x3)(1 — x4)][1 — A — x4) — x5)] 


This representation of #(x) is pictorially expressed as Figure 9.10. a 
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Figure 9.9 
1 1 2 4 
3 3 
2 5 4 5 


Figure 9.10 Minimal cut representation of the bridge system. 


9.3 Reliability of Systems of Independent Components 


In this section, we suppose that Xj, the state of the ith component, is a random 
variable such that 


P(X; = 1} =p; = 1— P{X; = 0} 


The value p;, which equals the probability that the ith component is functioning, 
is called the reliability of the ith component. If we define r by 


r= P{p(X) = 1}, where X = (X},..., Xn) 


then r is called the reliability of the system. When the components, that is, the 
random variables Xj,i = 1,...,7, are independent, we may express ras a function 
of the component reliabilities. That is, 

r=r(p), where p = (p1,...,Dn) 


The function r(p) is called the reliability function. We shall assume throughout 
the remainder of this chapter that the components are independent. 


Example 9.10 (The Series System) The reliability function of the series system 
of m independent components is given by 


r(p) = P{d(X) = I} 
= P{X; = 1 for alli =1,...,7} 


=| [pi | 
i=1 
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Example 9.11 (The Parallel System) The reliability function of the parallel sys- 
tem of 1 independent components is given by 
r(p) = P{o(X) = 1} 
= P{X; = 1 for some i = 1,...,7} 
= 1—- P{X; = 0 for alli=1,...,n} 


=1-]|]a-pa = 
i=1 


Example 9.12 (The &-out-of-7 System with Equal Probabilities) Consider a 
k-out-of-n system. If p; = p for all i = 1,...,7, then the reliability function is 
given by 


= Py xs e| 
i=1 
=) (")r'a a i 5 
i=k 


Example 9.13 (The Two-out-of-Three System) The reliability function of a two- 
out-of-three system is given by 


r(p) = P{p(X) = 1} 
= P{X = (1,1,1)} + P{X = (1,1, 0)} 
+ P{X = (1,0, 1)} + P{X = (0, 1, 1)} 
= p1p2p3 + pip2(1 — p3) + p11 — p2)p3 + 1 — p1)p2p3 
= pip2 + pip3 + p2p3 — 2pip2p3 _ 
Example 9.14 (The Three-out-of-Four System) The reliability function of a three- 
out-of-four system is given by 
r(p) = P{X = (1, 1,1, 1)} + P{X = (1,1,1,0)} + P{X = (1,1, 0, 1)} 
+ P{X = (1,0, 1,1)} + P{X = (0,1,1,1)} 
= pip2p3p4 + pip2p31 — p4) + piped — ps3)p4 
+ pill — p2)p3p4 + 1 — pi)p2p3p4 
= pip2p3 + pip2p4 + pip3p4 + p2p3p4 — 3p1p2p3p4 a 
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Example 9.15 (A Five-Component System) Consider a five-component system 
that functions if and only if component 1, component 2, and at least one of the 
remaining components function. Its reliability function is given by 


r(p) = P{X; = 1, Xz = 1, max(X3, X4, X5) = 1} 
= P{X; = 1} P{X2 = 1}P{max(X3, X4, X5) = 1} 
= pip2tl — 1 — p3) — p4)1 — ps)] a 


Since @(X) is a 0-1 (that is, a Bernoulli) random variable, we may also com- 
pute r(p) by taking its expectation. That is, 


r(p) = P(X) = 1) 
= E60] 


Example 9.16 (A Four-Component System) A four-component system that func- 
tions when both components 1 and 4, and at least one of the other components 
function has its structure function given by 


(x) = x4x4 Max(x2, x3) 


Hence, 


r(p) = Elp(X)] 
= E[X1X4(1 — (1 — X2)(1 — X3))] 
= pipall — 1 — p2) — p3)] | 
An important and intuitive property of the reliability function r(p) is given by 
the following proposition. 


Proposition 9.1 If r(p) is the reliability function of a system of independent 
components, then r(p) is an increasing function of p. 


Proof. By conditioning on X; and using the independence of the components, we 
obtain 
r(p) = E[g(X)] 
= piElO(X) | Xi = 1] + 1 — ppEl@(X) | X; = 0] 
= piElo(i, X)] + 1 — pp El@O;, X)] 


where 


(1;, X) = (X41, tee »Xj-1; 1, Xiad, tee Xn), 
(0;,.X) = (X11, tee » Xj-1,9, Xi41s tee » Xn) 


9.3 Reliability of Systems of Independent Components 589 


Thus, 
r(p) = PiE[@Ci, X) — $0), X)] + Elb(0j, X)] 
However, since ¢ is an increasing function, it follows that 
E[@(i, X) — $(0;, X)] 2 0 


and so the preceding is increasing in p; for all i. Hence, the result is proven. M 


Let us now consider the following situation: A system consisting of n different 
components is to be built from a stockpile containing exactly two of each type of 
component. How should we use the stockpile so as to maximize our probability 
of attaining a functioning system? In particular, should we build two separate 
systems, in which case the probability of attaining a functioning one would be 


P{at least one of the two systems function} 
= 1 — P{neither of the systems function} 


=1-[(-r(p)d—-7r(p’))] 


where p;(p;) is the probability that the first (second) number i component func- 
tions; or should we build a single system whose ith component functions if at least 
one of the number i components functions? In this latter case, the probability that 
the system will function equals 


1-1 -—p)d—-p’)] 
since 1 — (1 — pj)(1 — p}) equals the probability that the ith component in the 


single system will function.* We now show that replication at the component 
level is more effective than replication at the system level. 


Theorem 9.1 For any reliability function r and vectors p, p’, 
r1—(1—p)d—p)] > 1-1-7) — rp] 


Proof. Let X1,...,Xn,Xj,...,Xj, be mutually independent 0-1 random vari- 
ables with 


BHP GST), pH Pix = 1} 
Since P{max(X;, X') = 1} = 1— (1 —p,)(1 — p’), it follows that 


7{1 — (1 — p)(1 — p’)] = E(¢lmax(X, X’)]) 


* Notation: If x = (x1,...,;%n)) Y = (V15---5 Yn), then xy = (x1V1,.--;Xn¥n). Also, max(x,y) = 
(max(x1,1),---,Max(xy,¥n)) and min(x, y) = (min(x1,y1),.--,min(&y, yn). 
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However, by the monotonicity of ¢, we have that ¢[max(X, X’)] is greater than or 
equal to both ¢(X) and ¢(X’) and hence is at least as large as max[@(X), 6(X’)]. 
Hence, from the preceding we have 


71— (1 — p)(1 — p’)] = E[max(¢(X), o(X’))] 
= P{max[(X), ¢(X’)] = 1} 
= 1- P{p(X) = 0, 6(%’) = 0} 
=1-[1-7(p)][1 -r(p’)] 


where the first equality follows from the fact that max[@(X), @(X’)] is a 
0-1 random variable and hence its expectation equals the probability that it 
equals 1. a 


As an illustration of the preceding theorem, suppose that we want to build a 
series system of two different types of components from a stockpile consisting 
of two of each of the kinds of components. Suppose that the reliability of each 
component is 4. If we use the stockpile to build two separate systems, then the 
probability of attaining a working system is 

3)\2 _ 7 

1-(q) =% 

while if we build a single system, replicating components, then the probability of 
attaining a working system is 


("= 


Hence, replicating components leads to a higher reliability than replicating sys- 
tems (as, of course, it must by Theorem 9.1). 


ale 


9.4 Bounds on the Reliability Function 


Consider the bridge system of Example 9.8, which is represented by Figure 9.11. 
Using the minimal path representation, we have 


(x) = 1 — (1 — x1x%4)(1 — x1%3%5)(1 — x2x5)(1 — x2x3%4) 
Hence, 

r(p) = 1 — E[(l — X1 X4)(1 — X1X3X5)(1 — X2X5)(1 — X2X3X4)] 
However, since the minimal path sets overlap (that is, they have components 
in common), the random variables (1 —X 1X4), (1—X1X3X5), (1—X2Xs5), and 


(1 — X2X3X4) are not independent, and thus the expected value of their pro- 
duct is not equal to the product of their expected values. Therefore, in order to 
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Figure 9.11 


compute r(p), we must first multiply the four random variables and then take the 
expected value. Doing so, using that X? = X;, we obtain 
r(p) = E[X1X4 + X2X5 + X1X3X5 + X2K3X4 — Ky X2X3X4 
NK eS Ke = 
+ 2X1X2X3X4X5] 
= p1p4 + p2ps + Pip3Ps + p2p3p4 — pip2p3P4 — Pip2p3Ps 
— pip2p4Ps — Pip3P4bs — p2p3p4Ps + 2p1p2p3P4Ps 


As can be seen by the preceding example, it is often quite tedious to evaluate 
r(p), and thus it would be useful if we had a simple way of obtaining bounds. 
We now consider two methods for this. 

9.4.1 Method of Inclusion and Exclusion 


The following is a well-known formula for the probability of the union of the 
events F1, E2,..., En: 


»(U E:) = >> PE) — > > PGE) + > PEER) 
i=1 i=1 


ae | i<j<k 


— +++ + (-1)"*! P(E} Ey --- En) (9.3) 


A result, not as well known, is the following set of inequalities: 


°(U E:) < OPE), 
1 i=1 

»(U E,) > >I PE) — >) PEE), 
1 i 


i<j 


»(U E:) <> PE)- >) > PGE) + >> > PGEED), 
7 : 


i i<j i<j<k 


IN WV 


(9.4) 
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where the inequality always changes direction as we add an additional term of 
the expansion of P(\Uj_, Ei). 

Equation (9.3) is usually proven by induction on the number of events. How- 
ever, let us now present another approach that will not only prove Equation (9.3) 
but also establish Inequalities (9.4). 

To begin, define the indicator variables Jj,j = 1,...,, by 


1, if E; occurs 


'“ 10, otherwise 
Letting 
n 
N= ~ jj 
j=l 


then N denotes the number of the Ej, 1 <j < n, that occur. Also, let 


1, ifN>0 

~ 10, ifN=0 
Then, as 

tS a5" 


we obtain, upon application of the binomial theorem, that 
N 


1-I= > (“)c 


i=0 


eA a 


We now make use of the following combinatorial identity (which is easily estab- 
lished by induction on 2): 


C)- Git GQ -C7 2% ee 


The preceding thus implies that 


QO tae. (9.6) 


or 
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From Equations (9.5) and (9.6) we obtain 
I<Nn, by letting i= 2 in (9.6) 


) by letting i = 3 in (9.6) 
(9.7) 


and so on. Now, since N < mand (”) = 0 whenever i > m, we can rewrite 
Equation (9.5) as 


ic d, (“evn (9.8) 


Equation (9.3) and Inequalities (9.4) now follow upon taking expectations of 
(9.7) and (9.8). This is the case since 


E[I] = P{N > 0} = P{at least one of the FE; occurs} = °(U z) : 
1 


E[N] = e| 3 1 =>. FE@) 
j=l j=l 


(5) | = E[{number of pairs of the E; that occur] 


= aly oe i 


i<j 


= [Ss > P(E;E;) 


i<j 
and, in general 


l(*)| = E[number of sets of size i that occur] 


= e| ey alae “1 


W1<Jr<<fi 


= DODD PE En: Ei) 


ji <jr<e<i 
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The bounds expressed in (9.4) are commonly called the inclusion—exclusion 
bounds. To apply them in order to obtain bounds on the reliability function, let 
A1A2,...,Ags denote the minimal path sets of a given structure ¢, and define the 
events Fj, Ey,..., Es; by 


E; = {all components in A; function} 


Now, since the system functions if and only if at least one of the events FE; occur, 
we have 


np) = (UE) 
1 


Applying (9.4) yields the desired bounds on r(p). The terms in the summation are 
computed thusly: 


PE) =|] p 
leA; 
PEE) = [| Pp 
leAjUA; 
PEEE)= [| pi 
leAjUAjUA, 


and so forth for intersections of more than three of the events. (The preceding 
follows since, for instance, in order for the event E;E; to occur, all of the compo- 
nents in A; and all of them in A; must function; or, in other words, all components 
in A; U A; must function.) 

When the p;s are small the probabilities of the intersection of many of the 
events FE; should be quite small and the convergence should be relatively rapid. 


Example 9.17 Consider the bridge structure with identical component proba- 
bilities. That is, take p; to equal p for alli. Letting Ay = {1,4}, A2 = {1, 3,5}, A3 = 
{2,5}, and A4 = {2, 3,4} denote the minimal path sets, we have 
PFEDSPE) =P, 
P(E2) = P(E4) = p? 
Also, because exactly five of the six = (5) unions of A; and A; contain four 


components (the exception being Az U A4, which contains all five components), 
we have 


P(E, E2) = P(E, E3) = P(E, E4) = P(E2E3) = P(E3E4) = p*, 
P(E,E4) = p° 
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Hence, the first two inclusion—exclusion bounds yield 

2(p? + p*) — Sp* — Pp” <1) < 2p" +P”) 
where r(p) = r(p, p, p, p, p). For instance, when p = 0.2, we have 

0.08768 < 7r(0.2) < 0.09600 
and, when p = 0.1, 

0.02149 < r(0.1) < 0.02200 | 

Just as we can define events in terms of the minimal path sets whose union 

is the event that the system functions, so can we define events in terms of the 
minimal cut sets whose union is the event that the system fails. Let Cy, Cz,..., C;, 
denote the minimal cut sets and define the events F,,..., F, by 


= {all components in C; are failed} 


Now, because the system is failed if and only if all of the components of at least 
one minimal cut set are failed, we have 


1—r(p) = °(U r). 
1 
1—r(p) < dX PCF), 


1—rp) > Ere) - DY resp, 


a | 
1—r(p) < rey Sera + > oS PGR: 
i<j i<j<k 
and so on. As 
PF) = | [d-pp, 
leC; 
PF) = [] d-pp, 
leCjUC; 
P(FF Fp) = I] (1 — p)) 
leCjUCG;jUC, 


the convergence should be relatively rapid when the pjs are large. 
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Example 9.18 (A Random Graph) Let us recall from Section 3.6.2 that a graph 
consists of a set N of nodes anda set A of pairs of nodes, called arcs. For any two 
nodes i and j we say that the sequence of arcs (i, i1)(i1, i2),-.-, (ip, /) Constitutes 
an i-j path. If there is an i-j path between all the (5) pairs of nodes i and j, 
i # j, then the graph is said to be connected. If we think of the nodes of a 
graph as representing geographical locations and the arcs as representing direct 
communication links between the nodes, then the graph will be connected if 
any two nodes can communicate with each other—if not directly, then at least 
through the use of intermediary nodes. 

A graph can always be subdivided into nonoverlapping connected subgraphs 
called components. For instance, the graph in Figure 9.12 with nodes N = 
{1,2,3,4,5,6} and arcs A = {(1,2), (1, 3), (2, 3), (4, 5)} consists of three com- 
ponents (a graph consisting of a single node is considered to be connected). 


© 


Figure 9.12 


Consider now the random graph having nodes 1,2,...,”, which is such that 
there is an arc from node i to node j with probability P;;. Assume in addition 
that the occurrences of these arcs constitute independent events. That is, assume 
that the (5) random variables Xj,i 4 j, are independent where 


xX, = 1, if Gs) is an arc 
4 ~~ 10, otherwise 


We are interested in the probability that this graph will be connected. 

We can think of the preceding as being a reliability system of (5) components— 
each component corresponding to a potential arc. The component is said to work 
if the corresponding arc is indeed an arc of the network, and the system is said 
to work if the corresponding graph is connected. As the addition of an arc to 
a connected graph cannot disconnect the graph, it follows that the structure so 
defined is monotone. 

Let us start by determining the minimal path and minimal cut sets. It is easy 
to see that a graph will not be connected if and only if the set of nodes can be 
partitioned into two nonempty subsets X and X°¢ in such a way that there is no 
arc connecting a node from X with one from X°. For instance, if there are six 
nodes and if there are no arcs connecting any of the nodes 1, 2, 3, 4 with either 5 
or 6, then clearly the graph will not be connected. Thus, we see that any partition 
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of the nodes into two nonempty subsets X and X° corresponds to the minimal 
cut set defined by 


{(,f): 1€ X,j € X} 


As there are 2”~!—1 such partitions (there are 2” —2 ways of choosing a nonempty 
proper subset X and, as the partition X, X° is the same as X°, X, we must divide 
by 2) there are therefore this number of minimal cut sets. 

To determine the minimal path sets, we must characterize a minimal set of 
arcs that results in a connected graph. The graph in Figure 9.13 is connected 
but it would remain connected if any one of the arcs from the cycle shown in 
Figure 9.14 were removed. In fact it is not difficult to see that the minimal path 
sets are exactly those sets of arcs that result in a graph being connected but not 
having any cycles (a cycle being a path from a node to itself). Such sets of arcs 
are called spanning trees (Figure 9.15). It is easily verified that any spanning tree 
contains exactly m — 1 arcs, and it is a famous result in graph theory (due to 
Cayley) that there are exactly n”~* of these minimal path sets. 


Figure 9.14 


Figure 9.15 Two spanning trees (minimal path sets) when 1 = 4. 


Because of the large number of minimal path and minimal cut sets (7”~? and 
2”-!_ 1, respectively), it is difficult to obtain any useful bounds without making 
further restrictions. So, let us assume that all the Pj; equal the common value p. 
That is, we suppose that each of the possible arcs exists, independently, with 
the same probability p. We shall start by deriving a recursive formula for the 
probability that the graph is connected, which is computationally useful when 
n is not too large, and then we shall present an asymptotic formula for this 
probability when 7 is large. 

Let us denote by P,, the probability that the random graph having 1 nodes is 
connected. To derive a recursive formula for P, we first concentrate attention on 
a single node—say, node 1—and try to determine the probability that node 1 will 
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be part of a component of size k in the resultant graph. Now, for a given set of 
k — 1 other nodes these nodes along with node 1 will form a component if 


(i) there are no arcs connecting any of these k nodes with any of the remaining m — k 
nodes; 

(ii) the random graph restricted to these k nodes (and (3) potential arcs—each indepen- 
dently appearing with probability p) is connected. 


The probability that (i) and (ii) both occur is 
ght) P, 


where g = 1 — p. As there are (a) ways of choosing k — 1 other nodes (to form 
along with node 1 a component of size k) we see that 


P{node 1 is part of a component of size k} 


n—1 k(n—k) 
— P k=1,2,... 
(a ko 94> 5M 


Now, since the sum of the foregoing probabilities as k ranges from 1 through n 
clearly must equal 1, and as the graph is connected if and only if node 1 is part 
of a component of size n, we see that 


n—-1 


=) 
Bi (; Z jee n = 2,3,... (9.9) 
k=1 


Starting with Pj =1, P2 =p, Equation (9.9) can be used to determine P,, recur- 
sively when n is not too large. It is particularly suited for numerical compu- 
tation. 

To determine an asymptotic formula for P,, when 7 is large, first note from 
Equation (9.9) that since Pz < 1, we have 


n—-1 yj 1 
= k(n—k) 
1-Pr<) (me ‘ 


k=1 


As it can be shown that for g < 1 and n sufficiently large, 


n-1 ia 
y gh <i” 
4 k-1 


we have that for 7 large 


1-P,< (4+ 1)q""! (9.10) 
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To obtain a bound in the other direction, we concentrate our attention on a 
particular type of minimal cut set—namely, those that separate one node from 
all others in the graph. Specifically, define the minimal cut set C; as 


Ce={Gp: fF} 


and define F; to be the event that all arcs in C; are not working (and thus, node i 
is isolated from the other nodes). Now, 


1 — P,, = P(graph is not connected) > °(U r) 
i 


since, if any of the events F; occur, then the graph will be disconnected. By the 
inclusion—exclusion bounds, we have 


»(U B) > oP) — 2 PF) 
i i i<j 


As P(F;) and P(F;F;) are just the respective probabilities that a given set of n — 1 
arcs and a given set of 2m — 3 arcs are not in the graph (why?), it follows that 


Paeag 
PER) Sg", Ai 
and so 
1—P, > ng?! — (jer 


Combining this with Equation (9.10) yields that for sufficiently large, 


nq’! - Cee SB SiGe Yq"! 


as 1 — ©, we see that, for large n, 


1—P, ~ nq"! 


Thus, for instance, when 2 = 20 and p = is the probability that the random 
graph will be connected is approximately given by 


Px © 1 —20(4)'” = 0.99998 = 
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9.4.2 Second Method for Obtaining Bounds on r(p) 


Our second approach to obtaining bounds on r(p) is based on expressing the 
desired probability as the probability of the intersection of events. To do so, 
let Ay, Az,..., As; denote the minimal path sets as before, and define the events, 
Dj,i = 1,...,5 by 


Dj = {at least one component in A; has failed} 


Now since the system will have failed if and only if at least one component in 
each of the minimal path sets has failed we have 


1 —r(p) = P(D1D2---Ds) 
= P(D,)P(D2 | D1)--+P(Ds | D1 D2 --+ Ds—1) (9.11) 


Now it is quite intuitive that the information that at least one component of A is 
down can only increase the probability that at least one component of Az is down 
(or else leave the probability unchanged if Ay and Az do not overlap). Hence, 
intuitively 

P(D2 | D1) 2 P(D2) 
To prove this inequality, we write 

P(D2) = P(D2 | D1) P(D1) + P(D2 | D{) — P(D1)) (9.12) 


and note that 


P(D2 | D{) = P{at least one failed in Az | all functioning in Ay} 


=1- I] Dj 


jeAz 

j¢Ay 
<1-]]»; 

jeAg 
= P(D2) 


Hence, from Equation (9.12) we see that 
P(D2) < P(D2 | D1)P(D1) + P(D2) — P(D1)) 
or 


P(D2 | D1) > P(D2) 
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By the same reasoning, it also follows that 
P(Dj | Di --- Dj-1) 2 P(D)) 


and so from Equation (9.11) we have 


1—r(p) >] [PDs 


or, equivalently, 
r(p) <1 -TH(1 - I?) 
i jeA; 


To obtain a bound in the other direction, let Cy,..., C, denote the minimal cut 
sets and define the events Uj,..., U, by 


U; = {at least one component in C; is functioning} 


Then, since the system will function if and only if all of the events Uj; occur, we 
have 


r(p) = P(U; U2 - -- U,) 
= P(U;)P(U2 | Uy)---P(U, | Uy-+- U,-1) 
>[ [Pu 


where the last inequality is established in exactly the same manner as for the Dj. 
Hence, 


r(p) > |! -|Ja- P| 
i jeC; 
and we thus have the following bounds for the reliability function: 


Ht Ta -0| <p) <1-T](1-[] 7) (9.13) 
i jeC; i jeAj 

It is to be expected that the upper bound should be close to the actual r(p) if 
there is not too much overlap in the minimal path sets, and the lower bound to 
be close if there is not too much overlap in the minimal cut sets. 
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Example 9.19 For the three-out-of-four system the minimal path sets are Ay = 
{1,2,3}, Ao = {1,2,4}, A3 = {1,3,4}, and A4 = {2,3,4}; and the minimal cut 
sets are Cy = {1,2}, Cp = {1,3}, C3 = {1,4}, Ca = {2,3}, Cs = {2,4}, and 
Ce = {3,4}. Hence, by Equation (9.13) we have 


(1 — 4192) — qiq3)1 — qig4)(1 — 9293) — q2q4)(1 — 9394) 
< rip) <1—-( — pip2p3)0 — pip2p4) — pip3p4) — p2p3p4) 


where q; = 1 — pj. For instance, if pj = 4 for all i, then the preceding yields 
0.18 <7(5,...55) < 0.59 
The exact value for this structure is easily computed to be 


(55-0539) = 7 = 0.31 | 


9.5 System Life as a Function of Component Lives 


For a random variable having distribution function G, we define G(a) = 1— G(a) 
to be the probability that the random variable is greater than a. 

Consider a system in which the ith component functions for a random length 
of time having distribution F; and then fails. Once failed it remains in that state 
forever. Assuming that the individual component lifetimes are independent, how 
can we express the distribution of system lifetime as a function of the system 
reliability function r(p) and the individual component lifetime distributions F,, 
PEs cnet 

To answer this we first note that the system will function for a length of time 
t or greater if and only if it is still functioning at time t. That is, letting F denote 
the distribution of system lifetime, we have 


F(t) = P{system life > t} 
= P{system is functioning at time f} 


But, by the definition of r(p) we have 
P{system is functioning at time ¢} = r(P1(£),..., P(t) 
where 


P;(t) = P{component / is functioning at f} 
= P{lifetime of 7 > t} 
= F(t) 
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Hence, we see that 
F(t) = r(Fy(t),..., F(t) (9.14) 


Example 9.20 Ina series system, 7(p) = [|{ pj and so from Equation (9.14) 
- ie - 
Fa =] [ER 
1 


which is, of course, quite obvious since for a series system the system life is equal 
to the minimum of the component lives and so will be greater than t if and only 
if all component lives are greater than tf. a 


Example 9.21 Ina parallel system r(p) = 1 — []{(1 — p;) and so 
Fa) =1-]]R@ 
1 


The preceding is also easily derived by noting that, in the case of a parallel system, 
the system life is equal to the maximum of the component lives. a 


For a continuous distribution G, we define A(t), the failure rate function of 


G, by 


ra = &@ 

G(t) 
where g(t) = d/dtG(t). In Section 5.2.2, it is shown that if G is the distribution of 
the lifetime of an item, then A(£) represents the probability intensity that a t-year- 
old item will fail. We say that G is an increasing failure rate (IFR) distribution if 
A(t) is an increasing function of t. Similarly, we say that G is a decreasing failure 
rate (DFR) distribution if A(t) is a decreasing function of tf. 


Example 9.22 (The Weibull Distribution) A random variable is said to have the 
Weibull distribution if its distribution is given, for some A > 0,a@ > 0, by 


G@ =1- 60, t>0 
The failure rate function for a Weibull distribution equals 


e OM a(rt)?—1A 


=, = aa(aty*! 


A(t) = 
Thus, the Weibull distribution is IFR when aw > 1, and DFR when 0 < a < 1; 
when a = 1, G(t) = 1 — e~*, the exponential distribution, which is both IFR 
and DFR. | 
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Example 9.23 (The Gamma Distribution) A random variable is said to have a 
gamma distribution if its density is given, for some A > 0,a > 0, by 


- net (ary! 


where 
ra) = [ et?! dt 
0 


For the gamma distribution, 


Ae Git) 7 fag ne * (Ax) % dx 
A) og) AAO! 


00 x a-1 
= / ze Ale) dx 
t t 


With the change of variables u = x — t, we obtain 


1 -[~ (ety 
1 So © 2 i 


Hence, G is IFR when aw > 1 and is DFR when 0 <a < 1. | 


Suppose that the lifetime distribution of each component in a monotone system 
is IFR. Does this imply that the system lifetime is also IFR? To answer this, let us 
at first suppose that each component has the same lifetime distribution, which we 
denote by G. That is, F;(t) = G(t),i = 1,...,. To determine whether the system 
lifetime is IFR, we must compute A(t), the failure rate function of F. Now, by 
definition, 


_ (d/dt)F() 
_ d/dt)t — (GO) 
(Gd) 
where 


r(G(t)) = r(G(),..., G@) 
Hence, 


_ (GW) 
nG() 


Ap(t) G'(t) 
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_ Git) (G() Gt) 
~ GQ) Git) 
pr (p) 

r(p) p=Git) 


= ict) 


(9.15) 


Since G(t) is a decreasing function of t, it follows from Equation (9.15) that 
if each component of a coherent system has the same IFR lifetime distribution, 
then the distribution of system lifetime will be IFR if pr’ (p)/r(p) is a decreasing 
function of p. 


Example 9.24 (The k-out-of-1 System with Identical Components) Consider the 
k-out-of-n system, which will function if and only if k or more components func- 
tion. When each component has the same probability p of functioning, the num- 
ber of functioning components will have a binomial distribution with parameters 
n and p. Hence, 


(oy = >" (ize — py 


i=k 


which, by continual integration by parts, can be shown to be equal to 


= n! ge n—k 
r(p) = ean | x él = x) dx 


Upon differentiation, we obtain 


(p) = pap 


n! 
(kR—1)!\(n—k)! 


Therefore, 


pr'(p) -| r(p) ia 
rp) Lpr'(p) 


“LG Gay 4] 


Letting y = x/p yields 


pr(p) _ [f mc ae ik 
rp) LJo 2 aap ® 
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Since (1 — yp)/(1 — p) is increasing in p, it follows that pr’(p)/r(p) is decreasing 
in p. Thus, if a k-out-of- system is composed of independent, like compo- 
nents having an increasing failure rate, the system itself has an increasing failure 
rate. a 


It turns out, however, that for a k-out-of-n system, in which the independent 
components have different IFR lifetime distributions, the system lifetime need not 
be IFR. Consider the following example of a two-out-of-two (that is, a parallel) 
system. 


Example 9.25 (A Parallel System That Is Not IFR) The life distribution of a 
parallel system of two independent components, the ith component having an 
exponential distribution with mean 1/i,i = 1, 2, is given by 


F@)=1-Q-—e4]a -—e*4) 


ae eae 
Therefore, 
f(t) 
A(t) = = 
(t) F@) 


Qe-7t et = 36% 
e72t + ewt E=. e73t 


It easily follows upon differentiation that the sign of 4'(¢) is determined by e~*! — 


e~*' 4 3e-**, which is positive for small values and negative for large values of t. 
Therefore, A(t) is initially strictly increasing, and then strictly decreasing. Hence, 
F is not IFR. | 


Remark The result of the preceding example is quite surprising at first glance. 
To obtain a better feel for it we need the concept of a mixture of distribution 
functions. The distribution function G is said to be a mixture of the distributions 
G, and G if for some p,0 < p < 1, 


G(x) = pGi(x) + (1 — p)Go(x) (9.16) 


Mixtures occur when we sample from a population made up of two distinct 
groups. For example, suppose we have a stockpile of items of which the fraction 
p are type 1 and the fraction 1—p are type 2. Suppose that the lifetime distribution 
of type 1 items is G, and of type 2 items is G2. If we choose an item at random 
from the stockpile, then its life distribution is as given by Equation (9.16). 
Consider now a mixture of two exponential distributions having rates 41 and 
Aa where 44 < Az. We are interested in determining whether or not this mixture 
distribution is IFR. To do so, we note that if the item selected has survived up 
to time ¢, then its distribution of remaining life is still a mixture of the two 
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exponential distributions. This is so since its remaining life will still be exponential 
with rate A1 if it is type 1 or with rate A» if it is a type 2 item. However, the 
probability that it is a type 1 item is no longer the (prior) probability p but is 
now a conditional probability given that it has survived to time ft. In fact, its 
probability of being a type 1 is 


P{type 1, life > 7} 
Pilife >t} 
pe-Met 
~ pe t+ (1 — pet 


P{type 1| life > tf} = 


As the preceding is increasing in ft, it follows that the larger t is, the more likely 
it is that the item in use is a type 1 (the better one, since Ay < 42). Hence, the 
older the item is, the less likely it is to fail, and thus the mixture of exponentials 
far from being IFR is, in fact, DFR. 

Now, let us return to the parallel system of two exponential components having 
respective rates 41 and Az. The lifetime of such a system can be expressed as the 
sum of two independent random variables, namely, 


i oe A2 
Exp(A1) with probability ———— 
ps) p arias 


. r AY 
Exp(A2) with probability ———— 
pz) Pp y Mito 


system life = Exp(A, + A2) + 


The first random variable whose distribution is exponential with rate 44 + d2 
represents the time until one of the components fails, and the second, which is a 
mixture of exponentials, is the additional time until the other component fails. 
(Why are these two random variables independent?) 

Now, given that the system has survived a time f, it is very unlikely when t is 
large that both components are still functioning, but instead it is far more likely 
that one of the components has failed. Hence, for large t, the distribution of 
remaining life is basically a mixture of two exponentials—and so as t becomes 
even larger its failure rate should decrease (as indeed occurs). a 


Recall that the failure rate function of a distribution F(t) having density f(t) = 
F'(t) is defined by 


f(t) 


TS 


By integrating both sides, we obtain 
: ’ f(s) 
i X(s) ds = i Te 
= — log F(t) 
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Hence, 
F(t) =e 40 (9.17) 


where 
t 
A(t) = Xs) ds 
0 


The function A(f) is called the hazard function of the distribution F. 
Definition 9.1 A distribution F is said to have increasing failure on the average 
(IFRA) if 


A(t) _ fg A(s) ds 
a: 


(9.18) 


increases in t for t > 0. 


In other words, Equation (9.18) states that the average failure rate up to time t 
increases as ¢ increases. It is not difficult to show that if F is IFR, then F is IFRA; 
but the reverse need not be true. 

Note that F is IFRA if A(s)/s < A(t)/t whenever 0 < s < t, which is equiva- 
lent to 


eee force Laltrso 


at 


But by Equation (9.17) we see that A(t) = — log F(t), and so the preceding is 
equivalent to 


— log F(at) < —a log F(t) 
or equivalently, 
log F(at) > log F*(t) 


which, since log x is a monotone function of x, shows that F is IFRA if and 
only if 


Fat)> F(t) for0<a<1, allt >0 (9.19) 


For a vector p = (f1,...,Pn) we define p* = (p{,..., pi). We shall need the 
following proposition. 
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Proposition 9.2 Any reliability function r(p) satisfies 
rp") 2 [rpl*, O<aK<l 
Proof. We prove this by induction on n, the number of components in the system. 
If m = 1, then either r(p) = 0,r(p) = 1, or r(p) = p. Hence, the proposition 
follows in this case. 
Assume that Proposition 9.2 is valid for all monotone systems of 2 — 1 com- 


ponents and consider a system of 2 components having structure function ¢. By 
conditioning upon whether or not the mth component is functioning, we obtain 


rp") = Pnr(nsp") + A — pr)rOn, Pp") (9.20) 
Now consider a system of components 1 through m — 1 having a structure 
function ¢1(x) = (1y,x). The reliability function for this system is given by 


r1(p) = r(1n, p); hence, from the induction assumption (valid for all monotone 
systems of 2 — 1 components), we have 


r(1n,p") > (rn, p)]* 


Similarly, by considering the system of components 1 through n— 1 and structure 
function ¢0(x) = (On, x), we obtain 


r(On,p") > [r(On, p)I* 
Thus, from Equation (9.20), we obtain 
r(p*) > palr(n, p)l* + A — pay[rOn, p)]* 


which, by using the lemma to follow (with A = py, x = r(1y,p), y = r(On, p))s 
implies that 


r(p*) > [Pur(n, p) + A — pn)rOn, p)]* 
= [r(p)]* 


which proves the result. a 
Lemma 9.3 If0 <a <1,0<A <1, then 


A(y) = AMx™ + (1 — A%)y* — Ax + (1 — A)y)* 2 0 


forallO <y <x. 


Proof. The proof is left as an exercise. a 
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We are now ready to prove the following important theorem. 


Theorem 9.2 For a monotone system of independent components, if each com- 
ponent has an IFRA lifetime distribution, then the distribution of system lifetime 
is itself IFRA. 


Proof. The distribution of system lifetime F is given by 
F(at) = r(F;(at),..., F,(at)) 


Hence, since r is a monotone function, and since each of the component distri- 
butions F; is IFRA, we obtain from Equation (9.19) 


Fat) > r(FEQ@),..., FX) 
> (r(Fi@),..., Ft)" 
= F*(t) 


which by Equation (9.19) proves the theorem. The last inequality followed, of 
course, from Proposition 9.2. | 


9.6 Expected System Lifetime 

In this section, we show how the mean lifetime of a system can be determined, 
at least in theory, from a knowledge of the reliability function r(p) and the com- 
ponent lifetime distributions F;,i = 1,...,7. 


Since the system’s lifetime will be ¢ or larger if and only if the system is still 
functioning at time t, we have 


P{system life > t} = r(F(t)) 


where F(t) = (Fi (t),..., F,(t)). Hence, by a well-known formula that states that 
for any nonnegative random variable X, 


E[X] -|/ P{X > x} dx, 
0 
we see that* 


E[system life] = / ‘ r(F(t)) dt (9.21) 
0 


* That E[X] = [>° P{X > x} dx can be shown as follows when X has density f: 


oo co oo oo py 0° 
[ PX > apdx= [ i fo) dydx = [ [ foracay= f yf (y) dy = E[X] 
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Example 9.26 (A Series System of Uniformly Distributed Components) Consider 
a series system of three independent components each of which functions for 
an amount of time (in hours) uniformly distributed over (0,10). Hence, r(p) = 


Pip2p3 and 


t/10, O0O<t<10 |. 
F,(t) = i=1,2,3 
1, t> 10 
Therefore, 
foe 
- ——. <t< 
r(F()) = ( 10 ) ’ 0<t< 10 
0, t> 10 


and so from Equation (9.21) we obtain 


10 = 
E[system life] =} ( ") dt 
0 10 


1 
10 f y? dy 
0 


Example 9.27 (A Two-out-of-Three System) Consider a two-out-of-three system 
of independent components, in which each component’s lifetime is (in months) 
uniformly distributed over (0,1). As was shown in Example 9.13, the reliability 
of such a system is given by 


Nin 


1(p) = pip2 + pip3 + p2p3 — 2pip2p3 
Since 


O<t<l 


t, 
Ho = {i t>1 


we see from Equation (9.21) that 
1 
Elsystem life] = / (30 —1* —20 — 2) J dt 
0 


1 
=} (3y* — 2y3) dy 
0 
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Example 9.28 (A Four-Component System) Consider the four-component system 
that functions when components 1 and 2 and at least one of components 3 and 
4 functions. Its structure function is given by 


P(x) = X1%2(x3 + x4 — 3X4) 
and thus its reliability function equals 
(P) = Pip2(3 + p4 — p3P4) 


Let us compute the mean system lifetime when the ith component is uniformly 
distributed over (0,7), 7 = 1,2,3,4. Now, 


A= {9° se 
Faa= {9 oni 
Hence, 
At)= |" of) + - ocr 
‘ t>1 
Therefore, 


1 

Elsystem life] = 5, / (1—12)(2—1)(12 - t*) dt 
0 
593 


~ (24)(60) 
~ 0.41 |_| 


We end this section by obtaining the mean lifetime of a k-out-of-1 system 
of independent identically distributed exponential components. If @ is the mean 
lifetime of each component, then 


FQ) =e”? 


Hence, since for a k-out-of-1 system, 


n 


(Cpa D= >, (")e'a —p)"" 


i=k 
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we obtain from Equation (9.21) 
co 7 n d : 
E[system life] = i ( emia ed ee 
ee 
Making the substitution 


1 y 
=e, dy=— edt = —~dt 
yoe",” y Ae a 


yields 


E[system life] = 6 y (") i yd — y)*tdy 
i=k 


Now, it is not difficult to show that* 


1 4 i m!n! 
i: y" (1 — y)"dy = ear hy (9.22) 


Thus, the foregoing equals 


n! (i@—1)!(— 1)! 
(n —1)!i! n! 


E[system life] = 0) > 
i=k 


“1 
=6)05 (9.23) 
i=k 


Remark Equation (9.23) could have been proven directly by making use of spe- 
cial properties of the exponential distribution. First note that the lifetime of a 
k-out-of-n system can be written as Tj + --- + T,~41, where T; represents the 
time between the (i — 1)st and ith failure. This is true since Ty + --- + Ty p14 
equals the time at which the (1 — k + 1)st component fails, which is also the first 
time that the number of functioning components is less than k. Now, when all 
n components are functioning, the rate at which failures occur is 1/0. That is, 
T, is exponentially distributed with mean 0/n. Similarly, since T; represents the 


* Let 


1 
C(n,m) -f y"(1 — y)"dy 


Integration by parts yields C(m, m) = [m/(a + 1)]C(v + 1,m-—1). Starting with C(n, 0) = 1/(n + 1), 
Equation (9.22) follows by mathematical induction. 
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time until the next failure when there are m — (i — 1) functioning components, 
it follows that T; is exponentially distributed with mean 6/(m — i + 1). Hence, 
the mean system lifetime equals 


d i 
BIT, +++ Trotetl =6[ 2 ++ G| 


Note also that it follows, from the lack of memory of the exponential, that the 
Tj,i=1,...,2—k +1, are independent random variables. 


9.6.1 An Upper Bound on the Expected Life of a Parallel System 


Consider a parallel system of 7 components, whose lifetimes are not necessarily 
independent. The system lifetime can be expressed as 


system life = max X; 
t 


where X; is the lifetime of component j,i = 1,...,”. We can bound the expected 
system lifetime by making use of the following inequality. Namely, for any 
constant c 


n 
max Xj <c+ )(Xj-o+ (9.24) 

: i=1 
where x", the positive part of x, is equal to x if x > 0 and is equal to 0 if x < 0. 
The validity of Inequality (9.24) is immediate since if max X; < c then the left 
side is equal to max X; and the right side is equal to c. On the other hand, if 


X(n) = max X; > c then the right side is at least as large as c + (X(n) —c) = Xn). 
It follows from Inequality (9.24), upon taking expectations, that 


E[ max X;| <ct 3 El(X; — 0)*] (9.25) 
g =i 


Now, (X; — c)* is a nonnegative random variable and so 
CO 
El(X; -ot]= i P{(X; — c)* > x} dx 
0 
(oe) 
=f P{X; —c > x}dx 
0 


= / PX; > y}dy 


Cc 
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Thus, we obtain 
a oo 
E| max X;] See ia P{X; > y} dy (9.26) 
: ee 


Because the preceding is true for all c, it follows that we obtain the best bound 
by letting c equal the value that minimizes the right side of the preceding. To 
determine that value, differentiate the right side of the preceding and set the 
result equal to 0, to obtain 


n 
1—) > P{X; > c} =0 
i=1 
That is, the minimizing value of c is that value c* for which 
n 
SOPXS eH 1 
i=1 


Since )>/_, P{X; > c} is a decreasing function of c, the value of c* can be easily 
approximated and then utilized in Inequality (9.26). Also, it is interesting to note 
that c* is such that the expected number of the X; that exceed c* is equal to 1 
(see Exercise 32). That the optimal value of c has this property is interesting and 
somewhat intuitive in as much as Inequality (9.24) is an equality when exactly 
one of the X; exceeds c. 


Example 9.29 Suppose the lifetime of component i is exponentially distributed 
with rate \;,2 = 1,...,”. Then the minimizing value of c is such that 


n n 
1= So PG SoS seek 
i=1 i=1 


and the resulting bound of the mean system life is 


E| max X;| Sey SEK, ao 
: i=1 


i=1 


ae. 
=c+ ) eee Ae 

a Ki 

| 
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In the special case where all the rates are equal, say, A; = A,i =1,...,7, then 


—he* 


1 
1=ne or C= 5 log() 


and the bound is 
1 
E| max X;| < ; los) +1) 


That is, if X1,...,X, are identically distributed exponential random variables 
with rate A, then the preceding gives a bound on the expected value of their 
maximum. In the special case where these random variables are also independent, 
the following exact expression, given by Equation (9.25), is not much less than 
the preceding upper bound: 


be t fF i 1 
|= — ~ — ~ | 
E[ max X;| F 2 ae ; / 7 dx log (1) 


9.7 Systems with Repair 
Consider an m-component system having reliability function r(p). Suppose that 
component i functions for an exponentially distributed time with rate A; and 
then fails; once failed it takes an exponential time with rate pu; to be repaired, 
i=1,...,2. All components act independently. 

Let us suppose that all components are initially working, and let 


A(t) = P{system is working at f} 


A(f) is called the availability at time t. Since the components act independently, 
A(t) can be expressed in terms of the reliability function as follows: 


A(t) = r(A1 (0), ..-, An(t)) (9.27) 
where 

A;(t) = P{component i is functioning at t} 
Now the state of component i—either on or off—changes in accordance 


with a two-state continuous time Markov chain. Hence, from the results of 
Example 6.12 we have 


Hi Ai itmat 
Aj(t) = Poo(t) = + e Cit Hi 
MitAi wMitri 
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Thus, we obtain 


Lad A at :) 
Af) =r + eek 
® Gs w+r 


If we let t approach oo, then we obtain the limiting availability—call it A—which 
is given by 


AS lim, AC) =f a ) 
t>00 A+ pm 


Remarks 


(i) Ifthe on and off distribution for component i are arbitrary continuous distributions 
with respective means 1/4; and 1/y;,i =1,...,, then it follows from the theory of 
alternating renewal processes (see Section 7.5.1) that 


V/hi __ Hi 
V/Ag+1/uji mit Aj 


Aj(t) > 


and thus using the continuity of the reliability function, it follows from (9.27) that 
the limiting availability is 


A= lim A@) = (4) 


Hence, A depends only on the on and off distributions through their means. 

(ii) It can be shown (using the theory of regenerative processes as presented in Section 7.5) 
that A will also equal the long-run proportion of time that the system will be 
functioning. 


Example 9.30 For a series system, r(p) = []j_ pi and so 


n 


Li Xi jt | 
A(t) = + e vinki 
@ leer By taj 


and 


Example 9.31 For a parallel system, r(p) = 1 — []/_, (1 — p,) and thus 


ee eee 
A(t) =1 N[—« e aa 
i=1 
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and 


n 


X; 
A@)=1-|] : | 
a Hit hi 


The preceding system will alternate between periods when it is up and periods 
when it is down. Let us denote by U; and Dj,i > 1, the lengths of the ith up and 
down period respectively. For instance in a two-out-of-three system, U1 will be 
the time until two components are down; Dj, the additional time until two are 
up; U2 the additional time until two are down, and so on. Let 


_ OPS ae 6 
Conran See 
n—>0o n 
_ ioe een § 
p= hag 
n—>0o n 


denote the average length of an up and down period respectively.* 

To determine U and D, note first that in the first 7 up-down cycles—that is, 
in time )7;_,(U; + Dj)—the system will be up for a time )~;_, U;. Hence, the 
proportion of time the system will be up in the first 1 up-down cycles is 


Ue ts eS 7 yo, Uj/n 
Uj+---+U,+Dit---+D, YL, Ui/n+ XL, Di/n 


As 1 — oo, this must converge to A, the long-run proportion of time the system 
is up. Hence, 


pn 4=7 He ) (9.28) 
U+D A+ ph 


However, to solve for U and D we need a second equation. To obtain one 
consider the rate at which the system fails. As there will be 7 failures in time 
i (U; + Dj), it follows that the rate at which the system fails is 


n 
rate at which system fails = lim =; a 
. neo Doe Urb oD 
n 1 
= lim == = 9.29 
ni SiUjn+ Dim o+D ©" 


That is, the foregoing yields the intuitive result that, on average, there is one 
failure every U + D time units. To utilize this, let us determine the rate at which 
a failure of component i causes the system to go from up to down. Now, the 
system will go from up to down when component i fails if the states of the other 


* Tt can be shown using the theory of regenerative processes that, with probability 1, the preceding 
limits will exist and will be constants. 
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components x1,...,Xj-1) Xi-15--+,Xn are such that $(1;,x) = 1,6(0;,x) = 0. 
That is, the states of the other components must be such that 


edi, x) ~~ $(0;, x) =1 (9.30) 
Since component i will, on average, have one failure every 1/4; + 1/,; time units, 
it follows that the rate at which component i fails is equal to (1/4; + 1/mi)~! = 


AiMi/ (Ai + ;). In addition, the states of the other components will be such that 
(9.30) holds with probability 


P{p(1i, X(co)) — 6(0;, X(00)) = 1} 
= E[o(j, X(c0)) — (0;, X(Co))] since #C1;, X(C0)) — (Oj, X(Co)) 


is a Bernoulli random variable 
I a 
=r(1; 0; 
eee 


Hence, putting the preceding together we see that 
rate at which component Miki Me Me 
ie r| 1i, r\ 0i, 
i causes the system to fail ~ 4; + pj A+ ph +m 


Summing this over all components i thus gives 


. : Abi bh Mb 
te at which system fails = : ies 0;, 
rate at which system falls Breall rea (0,*—)] 


Finally, equating the preceding with (9.29) yields 


2 iMi i bh 
U+D = 23, aaa (1 4) (0,*—)| (9.31) 


Solving (9.28) and (9.31), we obtain 
a) 
P 
A+ 
“Aiki 
Saher el corry) 
ay i Tt Bi A+ mM A+ ph 


utes) 
D tH (9.33) 


(9.32) 


Also, (9.31) yields the rate at which the system fails. 
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Remark In establishing the formulas for U and D, we did not make use of 
the assumption of exponential on and off times and in fact, our derivation is 
valid and Equations (9.32) and (9.33) hold whenever U and D are well defined 
(a sufficient condition is that all on and off distributions are continuous). The 
quantities A;, 4;,i = 1,...,”, will represent, respectively, the reciprocals of the 
mean lifetimes and mean repair times. 


Example 9.32 For a series system, 


Li 
U= Mh iti = : 
= Aibi ome Bj Yaa 
Mit Mi TF ns by 
Li 
1—TT. 
= Lanes 1 
D= ei x 
DE Liki 
Mi + Aj 


whereas for a parallel system, 


Xr Xr 
—_ : t t= : 1 
v= Ni eh res 1 
ye Ki fhi I ; Aj hj pees 
Ait mi TP aj +d ty ty 
hi 
p= Lares we 1 
= ~ = 
fal, Di Mi 
Mi + Aj 


The preceding formulas hold for arbitrary continuous up and down distribu- 
tions with 1/4; and 1/; denoting respectively the mean up and down times of 
component 7,7 = 1,...,7. | 


9.7.1 A Series Model with Suspended Animation 


Consider a series consisting of m components, and suppose that whenever a com- 
ponent (and thus the system) goes down, repair begins on that component and 
each of the other components enters a state of suspended animation. That is, 
after the down component is repaired, the other components resume operation 
in exactly the same condition as when the failure occurred. If two or more com- 
ponents go down simultaneously, one of them is arbitrarily chosen as being the 
failed component and repair on that component begins; the others that went 
down at the same time are considered to be in a state of suspended animation, 
and they will instantaneously go down when the repair is completed. We suppose 
that (not counting any time in suspended animation) the distribution of time that 
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component i functions is F; with mean u;, whereas its repair distribution is G; 
with mean dj,i = 1,...,7. 

To determine the long-run proportion of time this system is working, we reason 
as follows. To begin, consider the time, call it T, at which the system has been up 
for a time t. Now, when the system is up, the failure times of component i consti- 
tute a renewal process with mean interarrival time u;. Therefore, it follows that 


: vinels t 
number of failures of i in time T ~ — 
uj 
As the average repair time of i is d;, the preceding implies that 


a ss tdj 
total repair time of i in time T ~ — 
Uj 


Therefore, in the period of time in which the system has been up for a time ¢, 
the total system downtime has approximately been 


ENG d;/uj 
t=1 


Hence, the proportion of time that the system has been up is approximately 
t 
t+ ty, di/uj 
Because this approximation should become exact as we let t become larger, it 
follows that 
1 
14+ >0;dj/uj 


proportion of time the system is up = (9.34) 


which also shows that 


proportion of time the system is down = 1 — proportion of time the system is up 
ilu 
14+ 90; 4i/uj 
Moreover, in the time interval from 0 to T, the proportion of the repair time 
that has been devoted to component i is approximately 
td; /u; 
ii tdi/ui 


Thus, in the long run, 


dj /u; 
i dj /uj 


proportion of down time that is due to component i = 
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Multiplying the preceding by the proportion of time the system is down gives 


d;/uj 
1+ F dj /Uj 


proportion of time component? is being repaired = 


Also, since component j will be in suspended animation whenever any of the 
other components is in repair, we see that 


parr d;/u; 
1+ >; d;/uj 


proportion of time component) isin suspended animation = 


Another quantity of interest is the long-run rate at which the system fails. Since 
component j fails at rate 1/u; when the system is up, and does not fail when the 
system is down, it follows that 


proportion of time system is up 


rate at whichi fails = 
Uj 


a 1/uj 
Sale Yo; di/ui 


Since the system fails when any of its components fail, the preceding yields that 


dei 1 /ui 
1+ di /Ui 


rate at which the system fails = (9.35) 


If we partition the time axis into periods when the system is up and those when 
it is down, we can determine the average length of an up period by noting that if 
U(#) is the total amount of time that the system is up in the interval [0, ¢], and if 
N(t) is the number of failures by time #, then 


U(t) 
im —— 
too N(t) 
U(@)/t 
rauce N(t)/t 
1 
iil /ui 


where the final equality used Equations (9.34) and (9.35). Also, in a similar 
manner it can be shown that 


average length of an up period 


>» di/ui 
Yi L/ui 


average length of a down period = (9.36) 
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Exercises 


1. Prove that, for any structure function ¢, 
P(x) = xii, x) + (1 — x1) 6 (i, x) 
where 


(1;, x) = (x4,. 625 Xj—-15 1, %j41)-- ->Xn)s 
(0;, x) = (x1,. +65 Xj-1,0, Xj415-- Xn) 
2. Show that 
(a) if 6(0,0,...,0) = 0 and 6(1,1,...,1) = 1, then 


minx; < $(x) < maxx; 


(b) o(max(x, y)) > max($(x), ¢(y)) 
(c) P(min(x, y)) < min(P(x), b(y)) 


3. For any structure function, we define the dual structure ¢? by 
¢? (x) =1-¢0-x) 


a) Show that the dual of a parallel (series) system is a series (parallel) system. 

b) Show that the dual of a dual structure is the original structure. 

c) What is the dual of a k-out-of-m structure? 

d) Show that a minimal path (cut) set of the dual system is a minimal cut (path) 
set of the original structure. 


*4, Write the structure function corresponding to the following: 
(a 


Figure 9.16 


(b) 


Figure 9.17 
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Figure 9.18 


5. Find the minimal path and minimal cut sets for: 


(a) 


Figure 9.19 


(b) 


Figure 9.20 


*6. The minimal path sets are {1, 2, 4}, {1, 3, 5}, and {5, 6}. Give the minimal cut sets. 
The minimal cut sets are {1, 2, 3}, {2, 3, 4}, and (3, 5}. What are the minimal path 


sets? 


8. Give the minimal path sets and the minimal cut sets for the structure given by 


Figure 9.21. 


9. Component i is said to be relevant to the system if for some state vector x, 


o(j,x) = 1, 


$(0;,x) = 0 


Otherwise, it is said to be irrelevant. 
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oe 


Figure 9.21 


10. 


LD 
agi 


(a) Explain in words what it means for a component to be irrelevant. 

(b) Let Ay,...,As be the minimal path sets of a system, and let S denote the set 
of components. Show that S = (j_, Aj if and only if all components are 
relevant. 

(c) Let Cy,..., Cz denote the minimal cut sets. Show that § = (es C; if and only 
if all components are relevant. 


Let ¢; denote the time of failure of the ith component; let tg(¢) denote the time to 
failure of the system ¢ as a function of the vector t = (f1,...,t,). Show that 


max min#; = tg(t) = min maxt?; 

1<j<s i€A; 1<j<k ieC 
where C1,..., Cy are the minimal cut sets, and A;,..., A; the minimal path sets. 
Give the reliability function of the structure of Exercise 8. 


Give the minimal path sets and the reliability function for the structure in 
Figure 9.22. 


1 4 
* *- 
2 


we 
oa 


Figure 9.22 


13. 


Let r(p) be the reliability function. Show that 


r(p) = pir, p) + A — p,)r(0j, p) 
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14. 


15: 


16. 


*17. 


18. 


19. 


20. 


Compute the reliability function of the bridge system (see Figure 9.11) by condi- 
tioning upon whether or not component 3 is working. 


Compute upper and lower bounds of the reliability function (using Method 2) for 
the systems given in Exercise 4, and compare them with the exact values when 
pi= 5: 

Compute the upper and lower bounds of r(p) using both methods for the 

(a) two-out-of-three system and 

(b) two-out-of-four system. 

(c) Compare these bounds with the exact reliability when 


(i) pp =0.5 
(ii) p; = 0.8 
(iii) p; = 0.2 


Let N be a nonnegative, integer-valued random variable. Show that 


(E[N])* 


P{N > 0} > aN 


and explain how this inequality can be used to derive additional bounds on a reli- 
ability function. 


Hint: 


E[N?] = E[N? | N > OJP{N > 0} (Why?) 
> (E[N | N > 0])*P{N > 0} (Why?) 


Now multiply both sides by P{N > 0}. 


Consider a structure in which the minimal path sets are {1, 2, 3} and {3, 4, 5}. 

(a) What are the minimal cut sets? 

(b) If the component lifetimes are independent uniform (0,1) random variables, 
determine the probability that the system life will be less than a 


Let X1, X2,..., Xn denote independent and identically distributed random variables 
and define the order statistics X(1),..., X() by 


X i) = ith smallest of X1,..., Xp 


Show that if the distribution of X; is IFR, then so is the distribution of Xj). 
Hint: Relate this to one of the examples of this chapter. 


Let F be a continuous distribution function. For some positive a, define the distri- 
bution function G by 


G(t) = (F@))* 


Find the relationship between AG(t) and Ap(t), the respective failure rate functions 
of G and F. 
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21. 


(i) 


(iii) 


a 2s 


23. 


Consider the following four structures: 


(it) 


1 
ee : 
Figure 9.23 Figure 9.24 
(iv) 

=e 

— - a 

a. _. 
Figure 9.25 Figure 9.26 


Let F,, F), and F3 be the corresponding component failure distributions; each 
of which is assumed to be IFR (increasing failure rate). Let F be the system failure 
distribution. All components are independent. 

(a) For which structures is F necessarily IFR if Fy = F) = F3? Give reasons. 
(b) For which structures is F necessarily IFR if Fy) = F3? Give reasons. 
(c) For which structures is F necessarily IFR if F, #4 F2 #4 F3? Give reasons. 


Let X denote the lifetime of an item. Suppose the item has reached the age of t. Let 
X;, denote its remaining life and define 


F,(a) = P{X; > a} 


In words, F;(a) is the probability that a t-year-old item survives an additional time a. 

Show that 

(a) F,(a) = F(t + a)/F(t) where F is the distribution function of X. 

(b) Another definition of IFR is to say that F is IFR if F,(a) decreases in t, 
for all a. Show that this definition is equivalent to the one given in the text 
when F has a density. 


Show that if each (independent) component of a series system has an IFR distribu- 
tion, then the system lifetime is itself IFR by 
(a) showing that 


Ar) = Dou 
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where Ap(f) is the failure rate function of the system; and A,(¢) the failure rate 
function of the lifetime of component i. 
(b) using the definition of IFR given in Exercise 22. 

24. Show that if F is IFR, then it is also IFRA, and show by counterexample that the 

reverse is not true. 
*25. We say that ¢ is a p-percentile of the distribution F if F(¢) = p. Show that if ¢ is a 

p-percentile of the IFRA distribution F, then 

F@) <e"y x Se 

PO) 260, SE 
where 

=(6e(1= 
bed og(1 — p) 
iq 

26. Prove Lemma 9.3. 

Hint: Let x = y + 6. Note that f(t) = ¢% is a concave function when 0 < 
a < 1, and use the fact that for a concave function f(t + h) — f(t) is decreas- 
ing in ¢. 

27. Let r(p) =1(p,p,..-,p). Show that if r(p9) = po, then 

r(p) 2p forp > po 
r(p) <p forp< po 
Hint: Use Proposition 9.2. 

28. Find the mean lifetime of a series system of two components when the component 
lifetimes are respectively uniform on (0,1) and uniform on (0,2). Repeat for a 
parallel system. 

29. Show that the mean lifetime of a parallel system of two components is 

1 
rl MA z i) 
Mitu2 (i+ b)e2 (ei + ba) ed 
when the first component is exponentially distributed with mean 1/1 and the 
second is exponential with mean 1/12. 
*30. Compute the expected system lifetime of a three-out-of-four system when the first 
two component lifetimes are uniform on (0, 1) and the second two are uniform on 
(0, 2). 
31. Show that the variance of the lifetime of a k-out-of-1 system of components, each 
of whose lifetimes is exponential with mean @, is given by 
n 
1 
2 
ey = 
i=k 
32. In Section 9.6.1 show that the expected number of X; that exceed c* is equal to 1. 
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33. 


34. 


35. 


36. 


Let X; be an exponential random variable with mean 8 + 23, fori = 1,2, 3. Use the 
results of Section 9.6.1 to obtain an upper bound on E[max X;], and then compare 
this with the exact result when the X; are independent. 


For the model of Section 9.7, compute for a k-out-of-n structure (i) the average up 
time, (ii) the average down time, and (iii) the system failure rate. 


Prove the combinatorial identity 
n—1\_(n n 2 
i-1) \i Back 


(a) by induction on i 
(b) by a backwards induction argument on i—that is, prove it first for i = 1, then 
assume it for i = k and show that this implies that it is true for i = k — 1. 


Verify Equation (9.36). 
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Brownian Motion and |-.))” 
Stationary Processes 


DS | 


10.1 Brownian Motion 


Let us start by considering the symmetric random walk, which in each time unit 
is equally likely to take a unit step either to the left or to the right. That is, it is 
a Markov chain with Pjj41 = 5 = Pjj-1,i = 0,+1,.... Now suppose that we 
speed up this process by taking smaller and smaller steps in smaller and smaller 
time intervals. If we now go to the limit in the right manner what we obtain is 
Brownian motion. 

More precisely, suppose that each At time unit we take a step of size Ax either 
to the left or the right with equal probabilities. If we let X(t) denote the position 
at time ¢ then 


X(t) = Ax(X1 +--+ + Xteyaq) (10.1) 
where 


+1, if the ith step of length Ax is to the right 
1 —} 


—1, if it is to the left 


[t/At] is the largest integer less than or equal to t/At, and the X; are assumed 
independent with 


P(X; = 1} = P(X; = -1} = 5 
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As ELX;] = 0, Var(X;) = E[X?] = 1, we see from Equation (10.1) that 
E[X(t)] = 0, 


Var(X(t)) = caar| | (10.2) 


We shall now let Ax and At go to 0. However, we must do it in a way such that 
the resulting limiting process is nontrivial (for instance, if we let Ax = At and 
let At > 0, then from the preceding we see that E[X(t)] and Var(X(t)) would 
both converge to 0 and thus X(t) would equal 0 with probability 1). If we let 
Ax = o/At for some positive constant o then from Equation (10.2) we see that 
as At > 0 


E[X(t)] = 0, 
Var(X(t)) > o*t 


We now list some intuitive properties of this limiting process obtained by taking 
Ax = ov At and then letting At > 0. From Equation (10.1) and the central limit 
theorem the following seems reasonable: 


(i) X(t) is normal with mean 0 and variance ot. In addition, because the changes of 
value of the random walk in nonoverlapping time intervals are independent, 
(ii) {X(t t > 0} has independent increments, in that for all ty < t2 < +--+ < ty 


X(tn) — X(ty-1), X (th_-1) — X(ty_-2),---» X(t2) — X(4), X (41) 


are independent. Finally, because the distribution of the change in position of the 
random walk over any time interval depends only on the length of that interval, it 
would appear that 

(iii) {X(t), t > 0} has stationary increments, in that the distribution of X(t + s) — X(t) 
does not depend on t. We are now ready for the following formal definition. 


Definition 10.1 A stochastic process {X(t), t > 0} is said to be a Brownian 
motion process if 


(i) X(0) = 0; 
(ii) {X(£), t > 0} has stationary and independent increments; 
(iii) for every t > 0, X(£) is normally distributed with mean 0 and variance o7t. 


The Brownian motion process, sometimes called the Wiener process, is one of 
the most useful stochastic processes in applied probability theory. It originated 
in physics as a description of Brownian motion. This phenomenon, named after 
the English botanist Robert Brown who discovered it, is the motion exhibited by 
a small particle that is totally immersed in a liquid or gas. Since then, the process 
has been used beneficially in such areas as statistical testing of goodness of fit, 
analyzing the price levels on the stock market, and quantum mechanics. 
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The first explanation of the phenomenon of Brownian motion was given by 
Einstein in 1905. He showed that Brownian motion could be explained by 
assuming that the immersed particle was continually being subjected to bom- 
bardment by the molecules of the surrounding medium. However, the preceding 
concise definition of this stochastic process underlying Brownian motion was 
given by Wiener in a series of papers originating in 1918. 

When o = 1, the process is called standard Brownian motion. Because any 
Brownian motion can be converted to the standard process by letting B(t) = 
X(t)/o we shall, unless otherwise stated, suppose throughout this chapter that 
o=1. 

The interpretation of Brownian motion as the limit of the random walks (Equa- 
tion (10.1)) suggests that X(t) should be a continuous function of t. This turns 
out to be the case, and it may be proven that, with probability 1, X(£) is indeed a 
continuous function of t. This fact is quite deep, and no proof shall be attempted. 

As X(t) is normal with mean 0 and variance tf, its density function is given by 


1 
fix) = ae 


V20t 


To obtain the joint density function of X(t1), X(t2),..., X(tn) for t1 <--- < th, 
note first that the set of equalities 


X(t) = x1, 
X(t2) = x2, 
X (ty) = Xn 


is equivalent to 


X(t) = 1, 
X(t2) — X(4) = x2 —- x41, 


X (tn) — X(ty-1) = Xn — Xn-1 


However, by the independent increment assumption it follows that X(t), 
X(t2) — X(t1),...,X (tn) — X(tr-1), are independent and, by the stationary 
increment assumption, that X(t,) — X(tg_1) is normal with mean 0 and vari- 
ance tp — tp_1. Hence, the joint density of X(f1),..., X (tz) is given by 


f (15 %25.++5%Xn) = fy 1) fin—t, (2 — 41) > ++ fig —ty_1 On — %n—1) 


1 2 _ 2 a—Xn- 2 
exp| E 4 Gam)" | Gn = %n-1) i 
ae 2 ty to -t th — tn—-1 (10 3) 
- (2)"/2[ty (t2 — th) ++ (ty — tr)! 
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From this equation, we can compute in principle any desired probabilities. For 
instance, suppose we require the conditional distribution of X(s) given that 
X(t) = B where s < t. The conditional density is 
fs@ofr—s(B x) 

fr(B) 
= Kj exp{—x*/2s — (B — x)*/2(t —s)} 


1 1 Bx 
2K 2 
2exp| ie (Stas) +} 
t sB 
= K. as 
2x | mo +) 


7 (x — Bs/t)? 
ae exp - Is(t—s)/t 


fe(*|B) = 


where K;, K2, and K3 do not depend on x. Hence, we see from the preceding 
that the conditional distribution of X(s) given that X(t) = B is, for s < t, normal 
with mean and variance given by 


E[X(8)|X(@) = B] = =B, 
Var[X(s)|X(t) = B] = =(t <5 (10.4) 


Example 10.1 In a bicycle race between two competitors, let Y(t) denote the 

amount of time (in seconds) by which the racer that started in the inside position 

is ahead when 100¢ percent of the race has been completed, and suppose that 

{Y(), 0 <t < 1} can be effectively modeled as a Brownian motion process with 

variance parameter o?. 

(a) If the inside racer is leading by o seconds at the midpoint of the race, what is the 
probability that she is the winner? 

(b) Ifthe inside racer wins the race by a margin of o seconds, what is the probability that 
she was ahead at the midpoint? 


Solution: 

(a) P{Y(1) = 0|Y(1/2) =o} 
= P{Y(1) — Y(1/2) > —o|Y(1/2) =o} 
= P{Y(1) — Y(1/2) > —o} by independent increments 
= P{Y(1/2) > —o} by stationary increments 

Y(1/2) 

- r| o/V/2 ~ ~v3| 
= (V2) 
~ 0.9213 


where ®(x) = P{N(0, 1) < x} is the standard normal distribution function. 
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(b) Because we must compute P{Y(1/2) > 0|Y(1) = o}, let us first determine 
the conditional distribution of Y(s) given that Y(t) = C, when s < t. Now, 
since {X(t), t > 0} is standard Brownian motion when X(t) = Y(t)/o, we 
obtain from Equation (10.4) that the conditional distribution of X(s), given 
that X(t) = C/o, is normal with mean sC/to and variance s(t — s)/t. Hence, 
the conditional distribution of Y(s) = oX(s) given that Y(t) = C is normal 
with mean sC/t and variance o*s(t — s)/t. Hence, 

P{Y(1/2) > O/Y(1) = 0} = P{N(o/2, 02/4) > 0} 
= @(1) 


= 0.8413 a 


10.2 Hitting Times, Maximum Variable, and the 
Gambler's Ruin Problem 


Let T, denote the first time the Brownian motion process hits a. When a > 0 we 
will compute P{T, < t} by considering P{X(t) > a} and conditioning on whether 
or not T, < t. This gives 


P{X(t) > a} = P{X() 2 alTa < t}P{Ta < t} 
+ P{X(t) > alTz > t}P{T, > t} (10.5) 


Now if T, < t, then the process hits a at some point in [0, t] and, by symmetry, 
it is just as likely to be above a or below a at time t. That is, 


P(X(@) > alTa < t} = 5 


As the second right-hand term of Equation (10.5) is clearly equal to 0 (since, 
by continuity, the process value cannot be greater than a without having yet hit 
a), we see that 


P{Ta < t} = 2P{X(d) 2 a} 


2: Py 2 
= / eo * /t dy 
V20t Ja 


~ 7 [* 2g a>0 (10.6) 
V20 Jal Ji “ 


For a < 0, the distribution of T, is, by symmetry, the same as that of T_,. 
Hence, from Equation (10.6) we obtain 
2 


lee) 
2: 
P{T, <t}=—= eld 10.7 
a = alii y ( ) 


636 Brownian Motion and Stationary Processes 


Another random variable of interest is the maximum value the process attains 
in [0, 2]. Its distribution is obtained as follows: For a > 0 


PI max X(s) > a} = P{T, < t} by continuity 
O<sXt 


= 2P{X(t) > a} from (10.6) 
ee 
J20 a/Jt 
Let us now consider the probability that Brownian motion hits A before —B 
where A > 0, B > 0. To compute this we shall make use of the interpretation 
of Brownian motion as being a limit of the symmetric random walk. To start let 
us recall from the results of the gambler’s ruin problem (see Section 4.5.1) that 
the probability that the symmetric random walk goes up A before going down 
B when each step is equally likely to be either up or down a distance Ax is (by 
Equation (4.14) with N = (A + B)/Ax, i = B/Ax) equal to BAx/(A + B)Ax = 
B/(A+ B). 
Hence, upon letting Ax > 0, we see that 
B 


P{up A before down B} = rag 


10.3 Variations on Brownian Motion 


10.3.1 Brownian Motion with Drift 


We say that {X(t), t > 0} is a Brownian motion process with drift coefficient jz 
and variance parameter o° if 


(i) X(0) = 0; 
(ii) {X(t), t > 0} has stationary and independent increments; 
(iii) X(t) is normally distributed with mean pt and variance to”. 


An equivalent definition is to let {B(t), t > 0} be standard Brownian motion 
and then define 


X(t) = oB(t) + ut 


10.3.2 Geometric Brownian Motion 


If {Y(¢), ¢ > 0} isa Brownian motion process with drift coefficient jz and variance 
parameter o~, then the process {X(t), t > 0} defined by 


xX =e 


is called geometric Brownian motion. 
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For a geometric Brownian motion process {X(t)}, let us compute the expected 
value of the process at time ¢ given the history of the process up to time s. That 
is, for s < t, consider E[X(t)|X(u), 0 <u < s]. Now, 

E[X()|X(#), 0 <u < s] = Ele |Y@), O<u<s] 
= Elev Ork@- Ys) | Y(u), 
= e’ OE eYO-YO|y(u), 
— X(s)E[eXO-¥9] 


s] 
s] 


O0<uK< 
O<uK< 


where the next to last equality follows from the fact that Y(s) is given, and the last 
equality from the independent increment property of Brownian motion. Now, the 
moment generating function of a normal random variable W is given by 


E[e*”] = ef EW] +a? Var(W)/2 


Hence, since Y(t) — Y(s) is normal with mean s(t — s) and variance (t — s)o?, it 
follows by setting a = 1 that 


E[eYO-YO] — pli(t—s)+(t-s)o?/2 
Thus, we obtain 
E[X(D|XW, 0 <u <s] = X(set-9Ut7/2 (10.8) 


Geometric Brownian motion is useful in the modeling of stock prices over 
time when you feel that the percentage changes are independent and identically 
distributed. For instance, suppose that X,, is the price of some stock at time n. 
Then, it might be reasonable to suppose that X,,/X,_1, 1 > 1, are independent 
and identically distributed. Let 


Yn = Xn/Xn-1 
and so 
Xn = YnXn-1 


Iterating this equality gives 


Xn = YnYn-1Xn-2 
= YnYn—1Yn-2Xn-3 


= YnYn-1--: Y1X0 
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Thus, 


log(X,) = Y-log(¥;) + log(Xo) 
i=1 


Since log(Y;), i > 1 are independent and identically distributed, {log(X,,)} will, 
when suitably normalized, approximately be Brownian motion with a drift, and 
so {X,} will be approximately geometric Brownian motion. 


10.4 Pricing Stock Options 


10.4.1 An Example in Options Pricing 


In situations in which money is to be received or paid out in differing time periods, 
we must take into account the time value of money. That is, to be given the amount 
v a time t in the future is not worth as much as being given v immediately. The 
reason for this is that if we were immediately given v, then it could be loaned out 
with interest and so be worth more than v at time f. To take this into account, we 
will suppose that the time 0 value, also called the present value, of the amount v 
to be earned at time t is ve“. The quantity a is often called the discount factor. 
In economic terms, the assumption of the discount function e~™ is equivalent to 
the assumption that we can earn interest at a continuously compounded rate of 
100@ percent per unit time. 

We will now consider a simple model for pricing an option to purchase a stock 
at a future time at a fixed price. 

Suppose the present price of a stock is $100 per unit share, and suppose we 
know that after one time period it will be, in present value dollars, either $200 or 
$50 (see Figure 10.1). It should be noted that the prices at time 1 are the present 
value (or time 0) prices. That is, if the discount factor is a, then the actual possible 
prices at time 1 are either 200e% or 50e%. To keep the notation simple, we will 
suppose that all prices given are time 0 prices. 

Suppose that for any y, at a cost of cy, you can purchase at time 0 the option 
to buy y shares of the stock at time 1 at a (time 0) cost of $150 per share. Thus, 


200 
100 
50 


time 0 price time 1 price 


Figure 10.1 


10.4 Pricing Stock Options 639 


for instance, if you do purchase this option and the stock rises to $200, then you 
would exercise the option at time 1 and realize a gain of $200 — 150 = $50 for 
each of the y option units purchased. On the other hand, if the price at time 1 
was $50, then the option would be worthless at time 1. In addition, at a cost 
of 100x you can purchase x units of the stock at time 0, and this will be worth 
either 200x or 50x at time 1. 

We will suppose that both x or y can be either positive or negative (or zero). 
That is, you can either buy or sell both the stock and the option. For instance, if 
x were negative then you would be selling —x shares of the stock, yielding you 
a return of —100x, and you would then be responsible for buying —x shares of 
the stock at time 1 at a cost of either $200 or $50 per share. 

We are interested in determining the appropriate value of c, the unit cost of an 
option. Specifically, we will show that unless c = 50/3 there will be a combination 
of purchases that will always result in a positive gain. 

To show this, suppose that at time 0 we 


buy x units of stock, and 
buy y units of options 


where x and y (which can be either positive or negative) are to be determined. 
The value of our holding at time 1 depends on the price of the stock at that time; 
and it is given by the following 


200x + SO0y, if price is 200 
value = 
50x, if price is 50 


The preceding formula follows by noting that if the price is 200 then the x units 
of the stock are worth 200x, and the y units of the option to buy the stock at a 
unit price of 150 are worth (200 — 150)y. On the other hand, if the stock price 
is 50, then the x units are worth 50x and the y units of the option are worthless. 
Now, suppose we choose y to be such that the preceding value is the same no 
matter what the price at time 1. That is, we choose y so that 


200x + SOy = 50x 
or 

y= —-3x 
(Note that y has the opposite sign of x, and so if x is positive and as a result x 
units of the stock are purchased at time 0, then 3x units of stock options are also 


sold at that time. Similarly, if x is negative, then —x units of stock are sold and 
—3x units of stock options are purchased at time 0.) 
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Thus, with y = —3x, the value of our holding at time 1 is 
value = 50x 


Since the original cost of purchasing x units of the stock and —3x units of 
options is 


original cost = 100x — 3xc, 
we see that our gain on the transaction is 
gain = 50x — (100x — 3xc) = x(3c — 50) 


Thus, if 3c = 50, then the gain is 0; on the other hand if 3c 4 50, we can guarantee 
a positive gain (no matter what the price of the stock at time 1) by letting x be 
positive when 3c > 50 and letting it be negative when 3c < 50. 

For instance, if the unit cost per option is c = 20, then purchasing 1 unit 
of the stock (x = 1) and simultaneously selling 3 units of the option (y = —3) 
initially costs us 100 — 60 = 40. However, the value of this holding at time 1 is 
50 whether the stock goes up to 200 or down to 50. Thus, a guaranteed profit of 
10 is attained. Similarly, if the unit cost per option is c = 15, then selling 1 unit 
of the stock (x = —1) and buying 3 units of the option (y = 3) leads to an initial 
gain of 100 — 45 = 55. On the other hand, the value of this holding at time 1 is 
—50. Thus, a guaranteed profit of 5 is attained. 

A sure win betting scheme is called an arbitrage. Thus, as we have just seen, 
the only option cost c that does not result in an arbitrage is c = 50/3. 


10.4.2 The Arbitrage Theorem 


Consider an experiment whose set of possible outcomes is S = {1,2,...,m}. 
Suppose that n wagers are available. If the amount x is bet on wager i, then the 
return x7r;(/) is earned if the outcome of the experiment is j. In other words, 7;(-) 
is the return function for a unit bet on wager i. The amount bet on a wager is 
allowed to be either positive or negative or zero. 

A betting scheme is a vector x = (x1,...,Xn) with the interpretation that x; 
is bet on wager 1, x2 on wager 2,..., and x, on wager n. If the outcome of the 
experiment is j, then the return from the betting scheme x is 


n 
return from x = > xiti) 
i=1 


The following theorem states that either there exists a probability vector p = 
(P1,---,Pm) on the set of possible outcomes of the experiment under which each 
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of the wagers has expected return 0, or else there is a betting scheme that guar- 
antees a positive win. 


Theorem 10.1 (The Arbitrage Theorem) Exactly one of the following is true: 
Either 


(i) there exists a probability vector p = ((1,...,Pm) for which 
SpHyat, teralleSiyagn 
j=l 


or 
(ii) there exists a betting scheme x = (x1,...,Xn) for which 


n 
So xiri(j) > 0, for allj =1,...,m 
i=1 


In other words, if X is the outcome of the experiment, then the arbitrage 
theorem states that either there is a probability vector p for X such that 


Eplri(X)] = 0, for alli=1,...,” 


or else there is a betting scheme that leads to a sure win. 


Remark This theorem is a consequence of the (linear algebra) theorem of the 
separating hyperplane, which is often used as a mechanism to prove the duality 
theorem of linear programming. 

The theory of linear programming can be used to determine a betting strategy 
that guarantees the greatest return. Suppose that the absolute value of the amount 
bet on each wager must be less than or equal to 1. To determine the vector x that 
yields the greatest guaranteed win—call this win v—we need to choose x and v 
so as to maximize v, subject to the constraints 


n 
ming) Sv, tory Sila 
i=1 


This optimization problem is a linear program and can be solved by standard 
techniques (such as by using the simplex algorithm). The arbitrage theorem yields 
that the optimal v will be positive unless there is a probability vector p for which 
ini Piri) = O for alli = 1,...,m. 


Example 10.2 Insome situations, the only types of wagers allowed are to choose 
one of the outcomes j,i = 1,...,, and bet that i is the outcome of the experi- 
ment. The return from such a bet is often quoted in terms of “odds.” If the odds 
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for outcome i are o; (often written as “o; to 1”) then a 1-unit bet will return 0; 
if the outcome of the experiment is i and will return —1 otherwise. That is, 


ole ifj=i 
ri(f) = 
‘ —1 otherwise 


Suppose the odds 01,..., 0, are posted. In order for there not to be a sure win 
there must be a probability vector p = (p1,..., Pm) such that 


0 = Ep[r;(X)] = ojpi — A — pi) 


That is, we must have 


Since the p; must sum to 1, this means that the condition for there not to be an 
arbitrage is that 


Yodt+o)'=1 
i=1 


Thus, if the posted odds are such that )°;(1 + o;)~! £ 1, then a sure win is 
possible. For instance, suppose there are three possible outcomes and the odds 
are as follows: 


Outcome Odds 


1 1 
2 2 
3 3 


That is, the odds for outcome 1 are 1 — 1, the odds for outcome 2 are 2 — 1, and 
that for outcome 3 are 3 — 1. Since 


a sure win is possible. One possibility is to bet —1 on outcome 1 (and so you 
either win 1 if the outcome is not 1 and lose 1 if the outcome is 1) and bet —0.7 
on outcome 2, and —0.5 on outcome 3. If the experiment results in outcome 1, 
then we win —1 + 0.7 + 0.5 = 0.2; if it results in outcome 2, then we win 
1—1.4 + 0.5 = 0.1; if it results in outcome 3, then we win 1 + 0.7 — 1.5 = 0.2. 
Hence, in all cases we win a positive amount. a 
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Remark If }°,(1 + 0;)~! #1, then the betting scheme 


(1+o)+ 


= =, petites 
i= 40)! ; ‘ 


Xj 


will always yield a gain of exactly 1. 


Example 10.3 Let us reconsider the option pricing example of the previous sec- 
tion, where the initial price of a stock is 100 and the present value of the price at 
time 1 is either 200 or 50. At a cost of c per share we can purchase at time 0 the 
option to buy the stock at time 1 at a present value price of 150 per share. The 
problem is to set the value of ¢ so that no sure win is possible. 

In the context of this section, the outcome of the experiment is the value of 
the stock at time 1. Thus, there are two possible outcomes. There are also two 
different wagers: to buy (or sell) the stock, and to buy (or sell) the option. By 
the arbitrage theorem, there will be no sure win if there is a probability vector 
(p, 1 — p) that makes the expected return under both wagers equal to 0. 

Now, the return from purchasing 1 unit of the stock is 


200 — 100 = 100, if the price is 200 at time 1 
return = 
50 — 100 = —50, if the price is 50 at time 1 


Hence, if p is the probability that the price is 200 at time 1, then 
E[return] = 100p — S0(1 — p) 
Setting this equal to 0 yields 
a 
That is, the only probability vector (p, 1—p) for which wager 1 yields an expected 
return 0 is the vector (3; 4). 


Now, the return from purchasing one share of the option is 


50-—c, if price is 200 
return = 
-—C, if price is 50 


Hence, the expected return when p = ; is 


E[return] = (50 — c)4 = ch 


=P-c 
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Thus, it follows from the arbitrage theorem that the only value of c for 
which there will not be a sure win is c= », which verifies the result of 
section 10.4.1. a 


10.4.3 The Black-Scholes Option Pricing Formula 


Suppose the present price of a stock is X(0) = xo, and let X(t) denote its price 
at time t. Suppose we are interested in the stock over the time interval 0 to T. 
Assume that the discount factor is a (equivalently, the interest rate is 100a percent 
compounded continuously), and so the present value of the stock price at time t 
ise“ X(t). 

We can regard the evolution of the price of the stock over time as our experi- 
ment, and thus the outcome of the experiment is the value of the function X(t), 
0 <t< T. The types of wagers available are that for any s < t we can observe 
the process for a time s and then buy (or sell) shares of the stock at price X(s) 
and then sell (or buy) these shares at time t for the price X(t). In addition, we 
will suppose that we may purchase any of N different options at time 0. Option 
i, costing cj per share, gives us the option of purchasing shares of the stock at 
time ¢; for the fixed price of K; per share,i=1,...,N. 

Suppose we want to determine values of the c; for which there is no betting 
strategy that leads to a sure win. Assuming that the arbitrage theorem can be gen- 
eralized (to handle the preceding situation, where the outcome of the experiment 
is a function), it follows that there will be no sure win if and only if there exists 
a probability measure over the set of outcomes under which all of the wagers 
have expected return 0. Let P be a probability measure on the set of outcomes. 
Consider first the wager of observing the stock for a time s and then purchas- 
ing (or selling) one share with the intention of selling (or purchasing) it at time 
t,0 <s <t< T. The present value of the amount paid for the stock is e~%X(s), 
whereas the present value of the amount received is e~%’ X(t). Hence, in order for 
the expected return of this wager to be 0 when P is the probability measure on 
X(t),0 <t < T, we must have 


Eple~“ X(t)|X(u),0 <u <s] =e *X(s) (10.9) 


Consider now the wager of purchasing an option. Suppose the option gives us 
the right to buy one share of the stock at time ¢ for a price K. At time t, the worth 
of this option will be as follows: 


X(t) — K, if X(t) > K 


worth of option at time t = 
0, if X(t) <K 
That is, the time t worth of the option is (X(t) — K)t. Hence, the present value of 
the worth of the option is e~* (X(t) — K)*. If c is the (time 0) cost of the option, 
we see that, in order for purchasing the option to have expected (present value) 
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return 0, we must have 
Eple-“ (X(t) — K)*] =e (10.10) 


By the arbitrage theorem, if we can find a probability measure P on the set of 
outcomes that satisfies Equation (10.9), then if c, the cost of an option to purchase 
one share at time ¢ at the fixed price K, is as given in Equation (10.10), then no 
arbitrage is possible. On the other hand, if for given prices c;,i = 1,...,N, there 
is no probability measure P that satisfies both (10.9) and the equality 


cj = Eple (X(t) — Kj)“, i=1,...,N 


then a sure win is possible. 

We will now present a probability measure P on the outcome X(t),0 <t < T, 
that satisfies Equation (10.9). 

Suppose that 


X(t) = xoe' 


where {Y(t),¢ > 0} is a Brownian motion process with drift coefficient 4 and 
variance parameter o*. That is, {X(t),t > 0} is a geometric Brownian motion 
process (see Section 10.3.2). From Equation (10.8) we have that, for s < t, 


E[X()|X@),0 <u < s] = X(set NH?) 
Hence, if we choose pz and o? so that 
wt a7 /2 =a 


then Equation (10.9) will be satisfied. That is, by letting P be the probability 
measure governing the stochastic process {x9e",0 < t < T}, where {Y(t)} is 
Brownian motion with drift parameter jz and variance parameter o*, and where 
w+ 07/2 =a, Equation (10.9) is satisfied. 

It follows from the preceding that if we price an option to purchase a share of 
the stock at time ¢ for a fixed price K by 


c = Eple~ (X(t) — K)*] 


Y(t) 


then no arbitrage is possible. Since X(t) = xoe*™, where Y(t) is normal with 


mean pt and variance to”, we see that 


9 1 255422 
ce’ = / (xge” — K)t ———e 9-9 Pt" dy 
—00 V2nto2 


[- (xge” — K) 1 —(y—pt)? /2t0? d 
= xoe — e y 
log(K/x) V2nto2 
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Making the change of variable w = 


ce’ = xpel4 


where 
= log(K/x9) — pt 
on/t 
Now, 
Pudi w/a d 
eo e Phy = 
JV 290 | 


{ 
= eto /2p P{N (0, 1) 
{ 


(y — wt)/(ot'/) yields 


1 [ owt —w?/2 i ve: —w?/2 
e e dw — K—— eo” / dw 
V20 Ja V20 Ja 


eto” /2 if / . e wot) /2 ai 


J2n 
e'2PIN(avi, 1) > a} 
>a-o/t} 
~(a-ov?)} 


e'/2PIN(O,1) < 


= to o(ovt _ a) 


(10.11) 


where N(m, v) is a normal random variable with mean m and variance v, and ¢ 
is the standard normal distribution function. 
Thus, we see from Equation (10.11) that 


ce! = xgeltttotl oat — a) — Ko(-a) 
Using that 
w+o7/2=a 
and letting b = —a, we can write this as follows: 
c = xo(ovt + b) — Ke $(b) 
where 
pa cen o*t/2. — log(K/xo) 


ost 


(10.12) 


The option price formula given by Equation (10.12) depends on the initial 
price of the stock xg, the option exercise time t, the option exercise price K, the 
discount (or interest rate) factor a, and the value o*. Note that for any value of 
o7, if the options are priced according to the formula of Equation (10.12) then no 
arbitrage is possible. However, as many people believe that the price of a stock 
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actually follows a geometric Brownian motion—that is, X(t) = xoe’ where 
Y(t) is Brownian motion with parameters jz and o7—it has been suggested that it 
is natural to price the option according to the formula of Equation (10.12) with 
the parameter o* taken equal to the estimated value (see the remark that follows) 
of the variance parameter under the assumption of a geometric Brownian motion 
model. When this is done, the formula of Equation (10.12) is known as the Black— 
Scholes option cost valuation. It is interesting that this valuation does not depend 
on the value of the drift parameter jz but only on the variance parameter o7. 

If the option itself can be traded, then the formula of Equation (10.12) can be 
used to set its price in such a way so that no arbitrage is possible. If at time s 
the price of the stock is X(s) = xs, then the price of a (t, K) option—that is, an 
option to purchase one unit of the stock at time t for a price K—should be set by 
replacing t by t — s and xo by xg in Equation (10.12). 


Remark If we observe a Brownian motion process with variance parameter o7 


over any time interval, then we could theoretically obtain an arbitrarily pre- 
cise estimate of o2. For suppose we observe such a process {Y(s)} for a time f. 
Then, for fixed h, let N = [¢/h] and set 


W, = Y(h) — Y(0), 
W> = Y(2h) — Y(h), 


Wn = Y(Nh) — Y(NA —|h) 


Then random variables W1,..., Wn are independent and identically distributed 
normal random variables having variance ho~. We now use the fact (see Sec- 
tion 3.6.4) that (N—1)S*/(o7h) has a chi-squared distribution with N—1 degrees 
of freedom, where S? is the sample variance defined by 


N 
S* = SiW; — WYP /(N - 1) 
i=1 


Since the expected value and variance of a chi-squared with k degrees of freedom 
are equal to k and 2k, respectively, we see that 


E[(N — 1)S*/(o*h)| = N—-1 
and 

Var[(N — 1)S*/(o*h)] = 2(N — 1) 
From this, we see that 


E[S?/h] = 0 
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and 
Var[S?/h] = 207 /(N — 1) 


Hence, as we let h become smaller (and so N = [t/h] becomes larger) the variance 
of the unbiased estimator of o* becomes arbitrarily small. a 


Equation (10.12) is not the only way in which options can be priced so that no 
arbitrage is possible. Let {X(t),0 < t < T} be any stochastic process satisfying, 
for s < t, 


Efe“ X(®|X(u), 0<u<s] =e %X(s) (10.13) 


(that is, Equation (10.9) is satisfied). By setting c, the cost of an option to purchase 
one share of the stock at time ¢ for price K, equal to 


c = Ele (X(t) — K)*] (10.14) 


it follows that no arbitrage is possible. 

Another type of stochastic process, aside from geometric Brownian motion, 
that satisfies Equation (10.13) is obtained as follows. Let Y1, Y2,... be a sequence 
of independent random variables having a common mean 4, and suppose that 
this process is independent of {N(t),t > 0}, which is a Poisson process with 
rate A. Let 


Nit) 


X(t) = xo I] Y; 


i=1 


Using the identity 


N(s) N() 
X(t) = xo I] Y; I] x 
i=1  j=N(s)+1 


and the independent increment assumption of the Poisson process, we see that, 
for s < t, 


N(t) 
E[X@|X(), O<u<s]=X()E} |] Y; 
jJ=N(s)+1 


Conditioning on the number of events between s and t yields 


N(t) 
E I] Yj => ue *-91V(¢ — s)]"/n! 
j=N(s)+1 n=0 


= e ht-s\1—-p) 


10.5 White Noise 649 


Hence, 

E[X(t)|X(u), 0 <u<s] = X(s)e 7% 4-90-) 
Thus, if we choose A and p to satisfy 

A= pw) =—0 


then Equation (10.13) is satisfied. Therefore, if for any value of A we let the 
Y; have any distributions with a common mean equal to w~=1+a/A and 
then price the options according to Equation (10.14), then no arbitrage is 
possible. 


Remark If {X(¢), t > 0} satisfies Equation (10.13), then the process {e~% X(t), 
t > O} is called a Martingale. Thus, any pricing of options for which the expected 
gain on the option is equal to 0 when {e~“ X(z)} follows the probability law of 
some Martingale will result in no arbitrage possibilities. 

That is, if we choose any Martingale process {Z(t)} and let the cost of a (t, K) 
option be 


c = Efe“ (e“ Z(t) — K)*] 
= E[(Z(@) — Ke~™)*] 
then there is no sure win. 
In addition, while we did not consider the type of wager where a stock that is 
purchased at time s is sold not at a fixed time ¢ but rather at some random time 


that depends on the movement of the stock, it can be shown using results about 
Martingales that the expected return of such wagers is also equal to 0. 


Remark A variation of the arbitrage theorem was first noted by de Finetti in 
1937. A more general version of de Finetti’s result, of which the arbitrage theorem 
is a special case, is given in Reference 3. 
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Let {X(t),t > 0} denote a standard Brownian motion process and let f be a 
function having a continuous derivative in the region [a, b]. The stochastic integral 


rhe f(t) dX(#) is defined as follows: 


b n 
[ foax@ = jim fixe) - XG) (10.15) 
g i=1 


max(tj;—-tj_1)- 0 
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where a = to < ty <-:- < t, = Disa partition of the region [a, b]. Using the 
identity (the integration by parts formula applied to sums) 


Yo fF G-aIXG) — XG-1)] 


i=1 


= f(b)X(b) — f(a)X(@) — D> X GIF (tH) — FG-1)1 


i=1 


we see that 
b b 
/ f(t) dX (t) = f(b) X(b) — f(a)X(a) — / X(t) df(t) (10.16) 


Equation (10.16) is usually taken as the definition of f(t) dx). 
By using the right side of Equation (10.16) we obtain, upon assuming the 
interchangeability of expectation and limit, that 


b 
e| [ ft) ax) | =0 


Also, 


var f (tia )1X(t) — X01) = SP Ga) VarlX@) — XG 


i=1 i=1 


=P GG - #1) 


i=1 


where the top equality follows from the independent increments of Brownian 
motion. Hence, we obtain from Equation (10.15) upon taking limits of the pre- 
ceding that 


b b 
Var| : fw dx] = i f° (t) dt 


Remark The preceding gives operational meaning to the family of quantities 
{dX(t), 0 < t < co} by viewing it as an operator that carries functions f into 
the values f/ f(t) dX(t). This is called a white noise transformation, or more 
loosely {dX(t), 0 < t < ov} is called white noise since it can be imagined that a 
time varying function f travels through a white noise medium to yield the output 


(at time b) [? f(t) dX(t). 


10.6 Gaussian Processes 651 


Example 10.4 Consider a particle of unit mass that is suspended in a liquid 
and suppose that, due to the liquid, there is a viscous force that retards the 
velocity of the particle at a rate proportional to its present velocity. In addition, 
let us suppose that the velocity instantaneously changes according to a constant 
multiple of white noise. That is, if V(t) denotes the particle’s velocity at t, suppose 
that 


V'(t) = —BV(@) + aX'(t) 
where {X(t), t > 0} is standard Brownian motion. This can be written as follows: 
ePTV'(t) + BV(t)] = ae’ X'(t) 


or 
d Bt Bty! 
—[e” V(t)] = ae” X'(t) 
dt 

Hence, upon integration, we obtain 


t 
e V(t) = V(0) +a i, eP$ X"'(s) ds 
0 


or 
t 
V(t) = Ve + a i e PU-S) dX (s) 
0 
Hence, from Equation (10.16), 


t 
V@) = Ve + af XY -[ X(s) Be BE) as| | 
0 


10.6 Gaussian Processes 


We start with the following definition. 


Definition 10.2 A stochastic process X(t), t > 0 is called a Gaussian, or a 
normal, process if X(t1),..., X(t) has a multivariate normal distribution for all 
Ef goccg tn 


If {X(£),t > 0} is a Brownian motion process, then because each of X(t), 
X(t2),..-5 X(t) can be expressed as a linear combination of the independent 
normal random variables X (t;), X(t2)—X(t1), X(t3)—X (to), ..., X (tn) —X (ty_1) 
it follows that Brownian motion is a Gaussian process. 
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Because a multivariate normal distribution is completely determined by the 
marginal mean values and the covariance values (see Section 2.6) it follows that 
standard Brownian motion could also be defined as a Gaussian process having 
E[X(£)] = 0 and, for s < t, 

Cov(X(s), X(£)) = Cov(X(s), X(s) + X(t) — X(s)) 
= Cov(X(s), X(s)) + Cov(X(s), X(t) — X(s)) 
= Cov(X(s), X(s)) by independent increments 
=s since Var(X(s)) =s (10.17) 

Let {X(t), t > 0} be a standard Brownian motion process and consider the 
process values between 0 and 1 conditional on X(1) = 0. That is, consider the 
conditional stochastic process {X(t), 0 < t < 1|X(1) = 0}. Since the conditional 
distribution of X(t1),...,X(t,) is multivariate normal it follows that this condi- 
tional process, known as the Brownian bridge (as it is tied down both at 0 and 


at 1), is a Gaussian process. Let us compute its covariance function. As, from 
Equation (10.4), 


E[X(s)|X(1) = 0] = 0, fors <1 
we have that, for s < t < 1, 


Cov[(X(s), X(¢))|X(1) = 0] 

[X(s)X()|X(1) = 0] 
E[X(s)X(0)|X(t), X(1) = O}]/X C1) = 0] 
X()ELX(s)|X()]| XC) = 0] 


=E 
El 


S [X@SX@IXA) . 0| by (10.4) 
= ~E[X? (1X (1) =0] 


“rd —t) by (10.4) 


=s(1-f) 


Thus, the Brownian bridge can be defined as a Gaussian process with mean value 0 
and covariance function s(1 — t), s < t. This leads to an alternative approach to 
obtaining such a process. 


Proposition 10.1 If {X(t), t > 0} is standard Brownian motion, then {Z(t), 0 < 
t < 1} is a Brownian bridge process when Z(t) = X(t) — tX(1). 
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Proof. As it is immediate that {Z(t), t > 0} is a Gaussian process, all we need 
verify is that E[Z(£)] = 0 and Cov(Z(s), Z(t)) = s(1 —t), when s < t. The former 
is immediate and the latter follows from 


Cov(Z(s), Z(£)) = Cov(X(s) — sX(1), X(t) — tX(1)) 
= Cov(X(s), X(£)) — t Cov(X(s), X(1)) 
— sCov(X(1), X(£)) + st Cov(X (1), X(1)) 
=s—st—st-+ st 
=s(1—12) 


and the proof is complete. 


If {X(t), t > 0} is Brownian motion, then the process {Z(t), t > 0} defined by 
t 

Z(t) =f X(s) ds (10.18) 
0 


is called integrated Brownian motion. As an illustration of how such a process 
may arise in practice, suppose we are interested in modeling the price of a com- 
modity throughout time. Letting Z(t) denote the price at ¢ then, rather than 
assuming that {Z(t)} is Brownian motion (or that log Z(t) is Brownian motion), 
we might want to assume that the rate of change of Z(t) follows a Brownian 
motion. For instance, we might suppose that the rate of change of the commod- 
ity’s price is the current inflation rate, which is imagined to vary as Brownian 
motion. Hence, 


d 
neo = X(t), 


t 
Z(t) = Z(O) + / X(s) ds 
0 


It follows from the fact that Brownian motion is a Gaussian process that 
{Z(t), t > 0} is also Gaussian. To prove this, first recall that W ,..., Wy, is 
said to have a multivariate normal distribution if they can be represented as 


m 
Wid aU; eo) ees 7) 
j=l 


where Uj, j = 1,..., are independent normal random variables. From this it 
follows that any set of partial sums of W1,..., Wy are also jointly normal. The 
fact that Z(t1),..., Z(t,) is multivariate normal can now be shown by writing 
the integral in Equation (10.18) as a limit of approximating sums. 
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As {Z(t), t > 0} is Gaussian it follows that its distribution is characterized by its 
mean value and covariance function. We now compute these when {X(t), t > 0} 
is standard Brownian motion. 


E[Z()] = elf X(s) as| 
=f E[X(s)] ds 
0 


=0 
Fors <t, 


Cov[Z(s), Z(t)] = E[Z(s)Z(0)] 


= ; ii E[X(y)X(u)] dy du 


E xo) dy iB X(u) au 


Ef ak X(y)X (u) dy au 


s t 
=f i min(y, u) dy du by (10.17) 

0 JO 
iC) oe ar 
aa mG y dy se y)du=s\5—< 


10.7 Stationary and Weakly Stationary Processes 


A stochastic process {X(t), t > 0} is said to be a stationary process if for all 
Nn, S,t,...,t, the random vectors X(t1),...,X(t,) and X(tj + s),...,X(tu+s) 
have the same joint distribution. In other words, a process is stationary if, in 
choosing any fixed point s as the origin, the ensuing process has the same prob- 
ability law. Two examples of stationary processes are: 


(i) An ergodic continuous-time Markov chain {X(¢), t > 0} when 
P{X(0) = 7} = Pj, j20 


where {Pj, j > 0} are the limiting probabilities. 
(ii) {X(t), t > 0} when X(t) = N(t + L) — N(@), t > 0, where L > 0 is a fixed constant 
and {N(£), t > 0} is a Poisson process having rate i. 


The first one of these processes is stationary for it is a Markov chain whose 
initial state is chosen according to the limiting probabilities, and it can thus be 
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regarded as an ergodic Markov chain that we start observing at time oo. Hence, 
the continuation of this process at time s after observation begins is just the contin- 
uation of the chain starting at time co + s, which clearly has the same probability 
for all s. That the second example—where X(t) represents the number of events of 
a Poisson process that occur between t and t + L—is stationary follows from the 
stationary and independent increment assumption of the Poisson process, which 
implies that the continuation of a Poisson process at any time s remains a Poisson 
process. 


Example 10.5 (The Random Telegraph Signal Process) Let {N(z), t > 0} denote 
a Poisson process, and let Xq be independent of this process and be such that 
P{X9 = 1} = P{X) = -l} = 5. Defining X(t) = Xo(—1)N then {X(t), t > 0} 
is called a random telegraph signal process. To see that it is stationary, note first 
that starting at any time ¢, no matter what the value of N(t), as Xo is equally 
likely to be either plus or minus 1, it follows that X(£) is equally likely to be either 
plus or minus 1. Hence, because the continuation of a Poisson process beyond 
any time remains a Poisson process, it follows that {X(t), t > O} is a stationary 
process. 

Let us compute the mean and covariance function of the random telegraph 
signal. 


E[X(t)] = E[Xo(-1)N] 

= E[XoJE(—-1)%] by independence 
0 since E[Xog] = 0, 
Cov[X(t), X(t + s)] = E[X(XCE + s)] 
E[X}(-1L)NOWET9)] 
= Ef(=Ty AO 1 ney 
E((-1)X@+9-N@]q 


=e AAs (10.19) 


For an application of the random telegraph signal consider a particle moving at 
a constant unit velocity along a straight line and suppose that collisions involving 
this particle occur at a Poisson rate A. Also suppose that each time the particle 
suffers a collision it reverses direction. Therefore, if Xo represents the initial 
velocity of the particle, then its velocity at time t—call it X (t)—is given by X(t) = 
Xo(—1)N, where N(t) denotes the number of collisions involving the particle 
by time t. Hence, if Xo is equally likely to be plus or minus 1, and is independent 
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of {N(t), t > O}, then {X(t), t > 0} is a random telegraph signal process. If we 
now let 


E 
pw = | X(s) ds 
0 


then D(¢) represents the displacement of the particle at time ¢ from its position 
at time 0. The mean and variance of D(t) are obtained as follows: 


t 
E[D()| = / E[X(s)] ds = 0, 
0 
Var[D(t)] = E[D*(t)] 


t t 
=+| [ X¢) dy | x du] 
0 0 


t t 
= [ i} E[X(y)X(w)] dy du 


=2 // E[X(y)X (u)] dy du 


O0<y<u<t 


t u 
=2f / e *4—-y) dydu by (10.19) 
0 JO 
1 1 


1 —2,At 
as Pee ees = 
-(: eer 


The condition for a process to be stationary is rather stringent and so we define 
the process {X(t), t > 0} to be a second-order stationary or a weakly stationary 
process if E[X(t)] = c and Cov[X(t), X(t + s)] does not depend on t. That is, a 
process is second-order stationary if the first two moments of X(t) are the same 
for all t and the covariance between X(s) and X(t) depends only on |f — s|. For 
a second-order stationary process, let 


R(s) = Cov[X(t), X(¢ + s)] 


As the finite dimensional distributions of a Gaussian process (being multivariate 
normal) are determined by their means and covariance, it follows that a second- 
order stationary Gaussian process is stationary. 


Example 10.6 (The Ornstein-Uhlenbeck Process) Let {X(t), t > 0} bea standard 
Brownian motion process, and define, for a > 0, 


V(t) = e %#/2 X (et) 


The process {V(t), t > 0} is called the Ornstein—-Uhlenbeck process. It has been 
proposed as a model for describing the velocity of a particle immersed in a liquid 
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or gas, and as such is useful in statistical mechanics. Let us compute its mean and 
covariance function. 


E[V(t)] = 0, 
Cov[V(t), VE + s)] = e7/27e- 8+ 9)/2_ Cov X (eM), X (0% E*9))] 


=e e48/29% by Equation (10.17) 
= oe as/2 


Hence, {V(t), t > 0} is weakly stationary and as it is clearly a Gaussian process 
(since Brownian motion is Gaussian) we can conclude that it is stationary. It is 
interesting to note that (with a = 4A) it has the same mean and covariance 
function as the random telegraph signal process, thus illustrating that two quite 
different processes can have the same second-order properties. (Of course, if two 
Gaussian processes have the same mean and covariance functions then they are 
identically distributed.) a 


As the following examples show, there are many types of second-order station- 
ary processes that are not stationary. 


Example 10.7 (An Autoregressive Process) Let Zo, Z1,Z2,... be uncorrelated 
random variables with E[Z,,] = 0, 7 > 0 and 


o*/1—-47), n=0 
Var(Zn) = 
o*, n>1 
where A? < 1. Define 
Xo = Zo, 
Xp =AXy_-1 + Zan, n>1 (10.20) 


The process {X,, 1 > 0} is called a first-order autoregressive process. It says that 
the state at time v (that is, X,,) is a constant multiple of the state at time m — 1 
plus a random error term Z,. 
Iterating Equation (10.20) yields 
Xn = AAXp—-2 + Zn-1) + Zn 
= Xa +ALn-1 + Zn 


= s rae 
i=0 
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and so 


n n+m 
Cov(Xn Xntm) = cov( SOA seta.) 
i=0 i=0 
n 


= you Cove 2) 


i=0 
1 ” 
_ 2, 2n+m —2i 
i=1 
= rae 
ic ae 


where the preceding uses the fact that Z; and Z; are uncorrelated when i 4 j. 
As E[X,] = 0, we see that {X,, 1 > 0} is weakly stationary (the definition for 
a discrete time process is the obvious analog of that given for continuous time 
processes). a 


Example 10.8 If, in the random telegraph signal process, we drop the require- 
ment that P{Xp = 1} = P{X) = -1} = 5 and only require that E[Xo] = 0, then 
the process {X(t), t > 0} need no longer be stationary. (It will remain stationary 
if Xo has a symmetric distribution in the sense that — Xo has the same distribution 
as Xy.) However, the process will be weakly stationary since 


E[X(t)] = E[XoJE[(-I)N®] = 0, 
Cov[X(t), X(t + s)] = E[X()X(¢t + s)] 
= EL XS] E[(-1)NOtN@ 9] 
=E[XpJe~* from (10.19) | 


Example 10.9 Let Wo, W1,W2,... be uncorrelated with E[W,] =m and 
Var(W,,) = 0%, n > 0, and for some positive integer k define 
Wat Wate + Wak 


X,= >k 
: Rad : a 


The process {X,,,” > k}, which at each time keeps track of the arithmetic average 
of the most recent k + 1 values of the Ws, is called a moving average process. 
Using the fact that the W,,, > 0 are uncorrelated, we see that 


(k+1—m)o? 
Cov(Xn,Xnim)=}4 (k+1)2 ’ 
0, ifm>k 


if0O<m<k 


Hence, {X,,, 1 > k} is a second-order stationary process. a 
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Let {X,,, 1 > 1} be a second-order stationary process with E[X,] = mw. An 
important question is when, if ever, does X, = )-"_, Xi/n converge to 4? The 
following proposition, which we state without proof, shows that E[(X,, —)7] > 
0 if and only if }°_., R@/n — 0. That is, the expected square of the difference 
between X,, and yu will converge to 0 if and only if the limiting average value of 
R(i) converges to 0. 


Proposition 10.2 Let {X,,7 > 1} be a second-order stationary process having 
mean yt and covariance function R(i) = Cov(X,, X4;), and let X, = )7j_, Xj/n. 
Then limy+oo E[(Xn — )*] = 0 if and only if limy+soo 7, R(i)/n = 0. 
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Suppose that the stochastic processes {X(tf), —oo < t < oo} and {Y(t), —oo < 
t < oo} are related as follows: 


Y(t) = i X(t — s)h(s) ds (10.21) 


We can imagine that a signal, whose value at time t is X(f), is passed through a 
physical system that distorts its value so that Y(t), the received value at t, is given 
by Equation (10.21). The processes {X(t)} and {Y(f)} are called, respectively, 
the input and output processes. The function h is called the impulse response 
function. If h(s) = 0 whenever s < 0, then / is also called a weighting function 
since Equation (10.21) expresses the output at ¢ as a weighted integral of all 
the inputs prior to t with h(s) representing the weight given the input s time 
units ago. 

The relationship expressed by Equation (10.21) is a special case of a time 
invariant linear filter. It is called a filter because we can imagine that the input 
process {X(t)} is passed through a medium and then filtered to yield the output 
process {Y(¢)}. It is a linear filter because if the input processes {X;(f)}, i = 1,2, 
result in the output processes {Y;(¢)}—that is, if Y;(t) = ix X;(t—s)h(s) ds—then 
the output process corresponding to the input process {aX 1(t) + bX2(t)} is just 
{aY1(t) + DY2(t)}. It is called time invariant since lagging the input process by a 
time t—that is, considering the new input process X(t) = X(t + t)—results in a 
lag of t in the output process since 


: X-snisds= | X(t +t—s)h(s)ds = Y(t +17) 
0 0 


Let us now suppose that the input process {X(t), —oo < t < oo} is weakly 
stationary with E[X(t)] = 0 and covariance function 


Rx(s) = Cov[ X(t), X(t + s)] 
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Let us compute the mean value and covariance function of the output pro- 
cess {Y(f)}. 

Assuming that we can interchange the expectation and integration operations 
(a sufficient condition being that f |h(s)| < co* and, for some M < 00, E|X(t)| < 
M for all t) we obtain 

E[Y(t)] = [exe —s)]h(s) ds = 0 
Similarly, 
Covl¥ (ti), Y()1 = cov] [ Xe1=snesy ds, f Xa = sa)ns2) ds | 
= / Cov[X(t1 — s1), X (tz — s2)]h(s1)A(s2) ds1 dsz 
= // Rx (t2 — s2 — t1 + s1)h(s1)h(s2) dsq ds2 (10.22) 
Hence, Cov[Y(t1), Y(t2)] depends on f1, f2 only through tz — t1; thus showing 
that {Y(z)} is also weakly stationary. 

The preceding expression for Ry(t2 — t1) = Cov[Y(f1), Y(t2)] is, however, 
more compactly and usefully expressed in terms of Fourier transforms of Rx and 
Ry. Let, for i= /—1, 

Ry(w) = [eterxe ds 
and 


Ry(w) = fetrve ds 


denote the Fourier transforms, respectively, of Rx and Ry. The function Ry (w) 
is also called the power spectral density of the process {X(t)}. Also, let 


h(w) = / e™Sh(s) ds 
denote the Fourier transform of the function h. Then, from Equation (10.22), 
Ry(w) — /// e™SRy(s —s7. + s1)h(s1)h(s2) dsy dso ds 


= /// el(S—2+51) Ry (5 — 5) + 54) dse 2h(s7) dsze™S*h(s1) dsy 
= Rx(w)h(w)h(—w) (10.23) 


*The range of all integrals in this section is from —oo to + oo. 
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Now, using the representation 


e~ =cosx + isinx, 


e~™ = cos(—x) + isin(—x) = cos x —isinx 


we obtain 
h(w)h(—w) = || h(s) cos(ws) ds — i f ms) sin(ws) as 


x | h(s) cos(ws) ds + if h(s) sin(ws) as| 


2 
= | h(s) cos(ws) as| + | [v9 sin(ws) as 


= [ [mse ds 


2 


2 ~ 
= |h(w)|* 


Hence, from Equation (10.23) we obtain 
Ry(w) = Rx(w)|h(w)/? 


In words, the Fourier transform of the covariance function of the output process 
is equal to the square of the amplitude of the Fourier transform of the impulse 
function multiplied by the Fourier transform of the covariance function of the 
input process. 


Exercises 


In the following exercises {B(t), t > 0} is a standard Brownian motion process 
and T, denotes the time it takes this process to hit a. 
*1. What is the distribution of B(s) + B(t), s < t? 


2. Compute the conditional distribution of B(s) given that B(t,) = A and B(tz2) = B, 
where 0 < ty <s <t. 


*3. Compute E[B(t,)B(t2)B(t3)] for ty < ty < t3. 
4. Show that 


P{T, < w~} = 1, 
E[Tz] = ~, a#0 


*5, What is P{T, < T_1 < T}? 


6. Suppose you own one share of a stock whose price changes according to a standard 
Brownian motion process. Suppose that you purchased the stock at a price b + c, 
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*10. 


11. 


12. 


13. 
14. 


1S. 


c > 0, and the present price is b. You have decided to sell the stock either when 
it reaches the price b + c or when an additional time t¢ goes by (whichever occurs 
first). What is the probability that you do not recover your purchase price? 


Compute an expression for 


ia max B(s) > x} 


ti <s<h 


Consider the random walk that in each Az time unit either goes up or down the 

amount V At with respective probabilities p and 1 — p, where p = z(1 + pJAt). 

(a) Argue that as At > 0 the resulting limiting process is a Brownian motion 
process with drift rate ju. 

(b) Using part (a) and the results of the gambler’s ruin problem (Section 4.5.1), 
compute the probability that a Brownian motion process with drift rate 1. goes 
up A before going down B, A > 0, B > 0. 


Let {X(¢), t > 0} be a Brownian motion process with drift coefficient jz and variance 
parameter 07. What is the joint density function of X(s) and X(t), s < t? 


Let {X(t), t > 0} be a Brownian motion process with drift coefficient jz and variance 
parameter o*. What is the conditional distribution of X(¢) given that X(s) =c 
when 

(a) s<t? 

(b) t<s? 

Consider a process whose value changes every / time units; its new value being its 
old value multiplied either by the factor e®Y” with probability p = ve + ¥J/h), 
or by the factor eo OVh with probability 1—p. As h goes to zero, show that this pro- 
cess converges to geometric Brownian motion with drift coefficient 4 and variance 


parameter o2 . 


A stock is presently selling at a price of $50 per share. After one time period, its 
selling price will (in present value dollars) be either $150 or $25. An option to 
purchase y units of the stock at time 1 can be purchased at cost cy. 

(a) What should c be in order for there to be no sure win? 

(b) Ifc = 4, explain how you could guarantee a sure win. 

(c) If¢ = 10, explain how you could guarantee a sure win. 

(d) Use the arbitrage theorem to verify your answer to part (a). 


Verify the statement made in the remark following Example 10.2. 


The present price of a stock is 100. The price at time 1 will be either 50, 100, or 

200. An option to purchase y shares of the stock at time 1 for the (present value) 

price ky costs cy. 

(a) If& = 120, show that an arbitrage opportunity occurs if and only if c > 80/3. 

(b) If & = 80, show that there is not an arbitrage opportunity if and only if 20 < 
c< 40. 


The current price of a stock is 100. Suppose that the logarithm of the price of the 
stock changes according to a Brownian motion process with drift coefficient 4 = 2 
and variance parameter o* = 1. Give the Black-Scholes cost of an option to buy 
the stock at time 10 for a cost of 


Exercises 663 


(a) 100 per unit. 

(b) 120 per unit. 

(c) 80 per unit. 

Assume that the continuously compounded interest rate is 5 percent. 


A stochastic process {Y(t), t > 0} is said to be a Martingale process if, for s < t, 


ELY@|Y@), 0<u<s]= Y(s) 


16. If{Y(t), t > 0} is a Martingale, show that 
E[Y(¢)] = E[Y(0)] 
17. Show that standard Brownian motion is a Martingale. 
18. Show that {Y(£), t > 0} is a Martingale when 
Y(t) = B’(t) -t 
What is E[Y(¢)]? 
Hint: First compute E[Y(t)|B(u), O<u<s]. 
*19. Show that {Y(¢),¢ > 0} is a Martingale when 
Y(t) = exp{cB(t) — c?t/2} 
where c is an arbitrary constant. What is E[Y(¢)]? 
An important property of a Martingale is that if you continually observe the process and 


then stop at some time T, then, subject to some technical conditions (which will hold in 
the problems to be considered), 


E[Y(T)] = E[Y(0)] 


The time T usually depends on the values of the process and is known as a stopping time 
for the Martingale. This result, that the expected value of the stopped Martingale is equal 
to its fixed time expectation, is known as the Martingale stopping theorem. 


*20. Let 
T = Min{t: B(t) = 2 — 4t} 


That is, T is the first time that standard Brownian motion hits the line 2 — 4t. Use 
the Martingale stopping theorem to find E[T]. 


21. Let {X(£), t > 0} be Brownian motion with drift coefficient 4. and variance param- 
eter o%. That is, 


X(t) = oB(t) + wt 
Let 2 > 0, and for a positive constant x let 
T = Min{t: X(t) = x} 


i Min{: Bt) =~ <I 


664 


Brownian Motion and Stationary Processes 


22. 


23. 


*24. 


25: 


26. 


PZT 


28. 
29. 


That is, T is the first time the process {X(t), t > 0} hits x. Use the Martingale 
stopping theorem to show that 


E[T] = x/ 


Let X(t) = oB(t) + ut, and for given positive constants A and B, let p denote the 

probability that {X(t), t > 0} hits A before it hits —B. 

(a) Define the stopping time T to be the first time the process hits either A or —B. 
Use this stopping time and the Martingale defined in Exercise 19 to show that 


Elexp{c(X(T) — uT)/o — c7T/2}] =1 
(b) Let ¢ = —2y/o, and show that 
Efexp{—2uX(T)/o}] = 1 


(c) Use part (b) and the definition of T to find p. 
Hint: What are the possible values of exp{—2uX(T)/o7}? 


Let X(t) = oB(t) + wt, and define T to be the first time the process {X(t), t > 0} 
hits either A or —B, where A and B are given positive numbers. Use the Martingale 
stopping theorem and part (c) of Exercise 22 to find E[T]. 


Let {X(¢), t > 0} be Brownian motion with drift coefficient 4. and variance param- 
eter 0”. Suppose that jz > 0. Let x > 0 and define the stopping time T (as in Exer- 
cise 21) by 


T = Min{t: X(t) = x} 


Use the Martingale defined in Exercise 18, along with the result of Exercise 21, to 
show that 


Var(T) = xo? /u3 


Compute the mean and variance of 

(a) fy tdB(t) 

(b) fy 22 dB(t) 

Let Y(t) = tB(1/t), t > 0 and Y(O) = 0. 

(a) What is the distribution of Y(t)? 

(b) Compare Cov(Y(s), Y(2)). 

(c) Argue that {Y(¢), t > 0} is a standard Brownian motion process. 


Let Y(t) = B(a*t)/a for a > 0. Argue that {Y(t)} is a standard Brownian motion 
process. 


For s < t, argue that B(s) — $B(¢) and B(¢) are independent. 
Let {Z(t), t > 0} denote a Brownian bridge process. Show that if 


Y@) =(¢+ )ZE¢/(t 4+ 1) 


then {Y(t), t > 0} is a standard Brownian motion process. 
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30. 


#31, 


B2, 


33. 


34. 


Let X(t) = N(¢ + 1) — N(é) where {N(¢), ¢ > 0} is a Poisson process with rate i. 
Compute 


Cov[X(£), X(t + s)] 


Let {N(t),t > 0} denote a Poisson process with rate A and define Y(t) to be the 
time from ¢ until the next Poisson event. 

(a) Argue that {Y(t), t > 0} is a stationary process. 

(b) Compute Cov[Y(t), Y(é + s)]. 

Let {X(t), —o0 < t < co} bea weakly stationary process having covariance function 
Rx(s) = Cov[X(t), X(t + 5)]. 

(a) Show that 


Var(X(t + s) — X(t)) = 2Rx(0) — 2Rx(t) 
(b) If Y@) = X(t + 1) — X(#) show that {Y(t), —oo < t < ov} is also weakly 


stationary having a covariance function Ry(s) = Cov[Y(t), Y(t + s)] that 
satisfies 


Ry(s) = 2Rx(s) — Rx(s — 1) — Rx(s + 1) 


Let Y; and Y2 be independent unit normal random variables and for some constant 
w set 


X(t) = Y, coswt + Y2 sinwt, -—0 <t<oo 


(a) Show that {X(z)} is a weakly stationary process. 

(b) Argue that {X(£)} is a stationary process. 

Let {X(£), —oo < t < oo} be weakly stationary with covariance function R(s) = 
Cov(X(é), X(¢ +.s)) and let R(w) denote the power spectral density of the process. 
(i) Show that R(w) = R(—w). It can be shown that 


R(s) = =/ R(w)e™S dw 
Qn J 


[oe] 


(ii) Use the preceding to show that 


/ - R(w) dw = 2nE[X2(0)] 


—o0o 
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11.1 Introduction 


Let X = (X1,...,X,) denote a random vector having a given density function 
f(*1,..-,Xn) and suppose we are interested in computing 


BIgQor= ff f gay... eris...5%0) der dea de 


for some n-dimensional function g. For instance, g could represent the total delay 
in queue of the first [7/2] customers when the X values represent the first [7/2] 
interarrival and service times.* In many situations, it is not analytically possible 
either to compute the preceding multiple integral exactly or even to numerically 
approximate it within a given accuracy. One possibility that remains is to approx- 
imate E[g(X)] by means of simulation. 

To approximate E[g(X)], start by generating a random vector X“ = 
(ae aes xq?) having the joint density f(x1,...,xn) and then compute Y) = 


g(X). Now generate a second random vector (independent of the first) 
X® and compute Y® =g(X®). Keep on doing this until 7, a fixed num- 
ber of independent and identically distributed random variables YY = g(X), 


*We are using the notation [a] to represent the largest integer less than or equal to a. 
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i= 1,...,r have been generated. Now by the strong law of large numbers, we 
know that 


yous... (r) . 
fie EE 2 ev] S BRO 


100 r 


and so we can use the average of the generated Ys as an estimate of E[g(X)]. This 
approach to estimating E[g(X)] is called the Monte Carlo simulation approach. 

Clearly there remains the problem of how to generate, or simulate, random 
vectors having a specified joint distribution. The first step in doing this is to be 
able to generate random variables from a uniform distribution on (0,1). One 
way to do this would be to take 10 identical slips of paper, numbered 0,1,..., 9, 
place them in a hat and then successively select 7 slips, with replacement, from 
the hat. The sequence of digits obtained (with a decimal point in front) can be 
regarded as the value of a uniform (0,1) random variable rounded off to the 
nearest (75)”- For instance, if the sequence of digits selected is 3, 8, 7, 2, 1, 
then the value of the uniform (0,1) random variable is 0.38721 (to the nearest 
0.00001). Tables of the values of uniform (0,1) random variables, known as 
random number tables, have been extensively published (for instance, see The 
RAND Corporation, A Million Random Digits with 100,000 Normal Deviates 
(New York: The Free Press, 1955)). Table 11.1 is such a table. 

However, this is not the way in which digital computers simulate uniform 
(0, 1) random variables. In practice, they use pseudo random numbers instead of 
truly random ones. Most random number generators start with an initial value 
Xo, called the seed, and then recursively compute values by specifying positive 
integers a,c, and m, and then letting 


Xnt1 = (aX,+c) modulom, n>0 


where the preceding means that aX, + c is divided by m and the remainder 
is taken as the value of X,41. Thus each X,, is either 0,1,..., or m— 1 and the 
quantity X,,/m is taken as an approximation to a uniform (0, 1) random variable. 
It can be shown that subject to suitable choices for a,c,m, the preceding gives 
rise to a sequence of numbers that looks as if it were generated from independent 
uniform (0,1) random variables. 

As our starting point in the simulation of random variables from an arbitrary 
distribution, we shall suppose that we can simulate from the uniform (0, 1) distri- 
bution, and we shall use the term “random numbers” to mean independent ran- 
dom variables from this distribution. In Sections 11.2 and 11.3 we present both 
general and special techniques for simulating continuous random variables; and 
in Section 11.4 we do the same for discrete random variables. In Section 11.5 we 
discuss the simulation both of jointly distributed random variables and stochas- 
tic processes. Particular attention is given to the simulation of nonhomogeneous 
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Table 11.1 A Random Number Table 


04839 96423 24878 82651 66566 14778 76797 14780 13300 87074 
68086 26432 46901 20848 89768 81536 86645 12659 92259 57102 
39064 66432 84673 40027 32832 61362 98947 96067 64760 64584 
25669 26422 44407 44048 37937 63904 45766 66134 75470 66520 
64117 94305 26766 25940 39972 22209 71500 64568 91402 42416 
87917 77341 42206 35126 74087 99547 81817 42607 43808 76655 
62797 56170 86324 88072 76222 36086 84637 93161 76038 65855 
95876 55293 18988 27354 26575 08625 40801 59920 29841 80150 
29888 88604 67917 48708 18912 82271 65424 69774 33611 54262 
73577 12908 30883 18317 28290 35797 05998 41688 34952 37888 
27958 30134 04024 86385 29880 99730 55536 84855 29080 09250 
90999 49127 20044 59931 06115 20542 18059 02008 73708 83517 
18845 49618 02304 51038 20655 58727 28168 15475 56942 53389 
94824 78171 84610 82834 09922 25417 44137 48413 25555 21246 
35605 81263 39667 47358 56873 56307 61607 49518 89356 20103 
33362 64270 01638 92477 66969 98420 04880 45585 46565 04102 
88720 82765 34476 17032 87589 40836 32427 70002 70663 88863 
39475 46473 23219 53416 94970 25832 69975 94884 19661 72828 
06990 67245 68350 82948 11398 42878 80287 88267 47363 46634 
40980 07391 58745 25774 22987 80059 39911 96189 41151 14222 
83974 29992 65381 38857 50490 83765 55657 14361 31720 57375 
33339 31926 14883 24413 59744 92351 97473 89286 35931 04110 
31662 25388 61642 34072 81249 35648 56891 69352 48373 45578 
93526 70765 10592 04542 76463 54328 02349 17247 28865 14777 
20492 38391 91132 21999 59516 81652 27195 48223 46751 22923 
04153 53381 79401 21438 83035 92350 36693 31238 59649 91754 
05520 91962 04739 13092 97662 24822 94730 06496 35090 04822 
47498 87637 99016 71060 88824 71013 18735 20286 23153 72924 
23167 49323 45021 33132 12544 41035 80780 45393 44812 12515 
23792 14422 15059 45799 22716 19792 09983 74353 68668 30429 
85900 98275 32388 52390 16815 69298 82732 38480 73817 32523 
42559 78985 05300 22164 24369 54224 35083 19687 11062 91491 
14349 82674 66523 44133 00697 35552 35970 19124 63318 29686 
17403 53363 44167 64486 64758 75366 76554 31601 12614 33072 
23632 27889 47914 02584 37680 20801 72152 39339 34806 08930 


Poisson processes, and in fact three different approaches for this are discussed. 
Simulation of two-dimensional Poisson processes is discussed in Section 11.5.2. 
In Section 11.6 we discuss various methods for increasing the precision of the 
simulation estimates by reducing their variance; and in Section 11.7 we consider 
the problem of choosing the number of simulation runs needed to attain a desired 
level of precision. Before beginning this program, however, let us consider two 
applications of simulation to combinatorial problems. 
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Example 11.1 (Generating a Random Permutation) Suppose we are interested 
in generating a permutation of the numbers 1,2,..., that is such that all 7! 
possible orderings are equally likely. The following algorithm will accomplish 
this by first choosing one of the numbers 1,..., at random and then putting 
that number in position 7; it then chooses at random one of the remaining n — 1 
numbers and puts that number in position 7 — 1; it then chooses at random one 
of the remaining 7 — 2 numbers and puts it in position 7 — 2, and so on (where 
choosing a number at random means that each of the remaining numbers is 
equally likely to be chosen). However, so that we do not have to consider exactly 
which of the numbers remain to be positioned, it is convenient and efficient 
to keep the numbers in an ordered list and then randomly choose the position 
of the number rather than the number itself. That is, starting with any initial 
ordering p1,2,-.-5Pn, we pick one of the positions 1,..., at random and then 
interchange the number in that position with the one in position n. Now we 
randomly choose one of the positions 1,...,7 — 1 and interchange the number 
in this position with the one in position 7 — 1, and so on. 

To implement the preceding, we need to be able to generate a random variable 
that is equally likely to take on any of the values 1,2, ...,&. To accomplish this, let 
U denote a random number—that is, U is uniformly distributed over (0, 1)—and 
note that RU is uniform on (0, k) and so 


1 
PG-1<kU<i= 5, i=1,...5k 
Hence, the random variable J = [RU] + 1 will be such that 
. . . ok 
P{l =i} = P{{RU] =i-1} =Pfi-1<kU <= k 


The preceding algorithm for generating a random permutation can now be writ- 
ten as follows: 


Step 1: Let p1,p2,...,Pn be any permutation of 1,2,...,7 (for instance, we can choose 
pj =), =1,...57). 

Step 2: Setk=n. 

Step 3: Generate a random number U and let I = [RU] + 1. 

Step 4: Interchange the values of p; and py. 

Step 5: Letk=k—J1andifk > 1 goto step 3. 

Step 6: p1,...,Pn is the desired random permutation. 


For instance, suppose 1 = 4 and the initial permutation is 1, 2, 3, 4. If the first 
value of I (which is equally likely to be either 1, 2, 3, 4) is I = 3, then the new 
permutation is 1, 2, 4, 3. If the next value of I is I = 2 then the new permutation 
is 1, 4, 2, 3. If the final value of I is I = 2, then the final permutation is 1, 4, 2, 
3, and this is the value of the random permutation. 
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One very important property of the preceding algorithm is that it can also be 
used to generate a random subset, say of size r, of the integers 1,...,. Namely, 
just follow the algorithm until the positions n,n—1,...,2—r+ 1 are filled. The 
elements in these positions constitute the random subset. a 


Example 11.2 (Estimating the Number of Distinct Entries in a Large List) Con- 
sider a list of 7 entries where 7 is very large, and suppose we are interested in 
estimating d, the number of distinct elements in the list. If we let 72; denote the 
number of times that the element in position i appears on the list, then we can 
express d by 


To estimate d, suppose that we generate a random value X equally likely to be 
either 1,2,...,2 (that is, we take X = [nU] + 1) and then let m(X) denote the 
number of times the element in position X appears on the list. Then 


| | se 
m(X) min on 


Hence, if we generate k such random variables X1,...,X, we can estimate d by 


nyt, 1/m(Xi) 


dx k 


Suppose now that each item in the list has a value attached to it—v(i) being the 
value of the ith element. The sum of the values of the distinct items—call it v—can 
be expressed as 


Now if X = [7U] + 1, where U is a random number, then 


“La m(i)n on 
t= 1 


A 22.) yp 2@1_ 2 
Ee m(i) n 


Hence, we can estimate v by generating X1,..., X, and then estimating v by 
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For an important application of the preceding, let Aj = {4j,1,...,din;},4= 1,.-..,8 
denote events, and suppose we are interested in estimating P( ;_, Ai). Since 


: ie EEG) 
P Aj] = Pay= 
(U 2 » ja (4, j) 


aceVA; 


where m(q;, ;) is the number of events to which the point a; ; belongs, the preceding 
method can be used to estimate P( Uj Ai). 

Note that the preceding procedure for estimating v can be effected without 
prior knowledge of the set of values {v1,...,v,}. That is, it suffices that we can 
determine the value of an element in a specific place and the number of times 
that element appears on the list. When the set of values is a priori known, there 
is another approach available as will be shown in Example 11.11. | 


11.2 General Techniques for Simulating Continuous 
Random Variables 


In this section we present three methods for simulating continuous random 
variables. 


11.2.1 The Inverse Transformation Method 


A general method for simulating a random variable having a continuous distribu- 
tion—called the inverse transformation method—is based on the following 
proposition. 


Proposition 11.1 Let U bea uniform (0, 1) random variable. For any continuous 
distribution function F if we define the random variable X by 


X = F-!(U) 
then the random variable X has distribution function F. (F~'(u) is defined to 
equal that value x for which F(x) = u.) 
Proof. 
Fx(@) = P{X <4} 
= P{F-!(U) <a} (11.1) 


Now, since F(x) is a monotone function, it follows that F~'!(U) < a if and only 
if U < F(a). Hence, from Equation (11.1), we see that 


Fx(a) = P{U < F(a} 
= F(a) a 
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Hence, we can simulate a random variable X from the continuous distribu- 
tion F, when F~! is computable, by simulating a random number U and then 
setting X = F-l(U). 


Example 11.3 (Simulating an Exponential Random Variable) If F(x) = 1—e™, 
then F~!(w) is that value of x such that 


l-e*=u 

or 
x = —log(1 — u) 

Hence, if U is a uniform (0, 1) variable, then 
F-'(U) = —log(1 — U) 


is exponentially distributed with mean 1. Since 1— U is also uniformly distributed 
on (0, 1) it follows that — log U is exponential with mean 1. Since cX is expo- 
nential with mean c when X is exponential with mean 1, it follows that —clog U 
is exponential with mean c. a 


11.2.2 The Rejection Method 


Suppose that we have a method for simulating a random variable having density 
function g(x). We can use this as the basis for simulating from the continuous 
distribution having density f(x) by simulating Y from g and then accepting this 
simulated value with a probability proportional to f(Y)/g(Y). 

Specifically, let c be a constant such that 


[W <, for all y 


gly) 


We then have the following technique for simulating a random variable having 
density f. 


Rejection Method 


Step 1: Simulate Y having density g and simulate a random number U. 
Step 2: If U < f(Y)/cg(Y) set X = Y. Otherwise return to step 1. 


Proposition 11.2 The random variable X generated by the rejection method has 
density function f. 
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Proof. Let X be the value obtained, and let N denote the number of necessary 
iterations. Then 


P{X <x} = P{Yn < x} 
= P{Y <x|U < f(Y)/cg(Y)} 
_ PY <x,U <f(¥)/cg(Y)} 


K 

_ SPY <x,U <f(¥Y)/eg(Y)IY = y}g(y) dy 

= K 

Poo FO/egog(y) dy 

K 

Pf) dy 

a Ke 
where K=P{U < f(Y)/cg(Y)}. Letting x — oo shows that K = 1/c and the 
proof is complete. a 
Remarks 


(i) 


The preceding method was originally presented by Von Neumann in the special case 
where g was positive only in some finite interval (a,b), and Y was chosen to be 
uniform over (a, b) (that is, Y= a+ (b—a)U). 

Note that the way in which we “accept the value Y with probability f(Y)/cg(Y)” 
is by generating a uniform (0,1) random variable U and then accepting Y if 
U < f(Y)/cg(Y). 

Since each iteration of the method will, independently, result in an accepted value 
with probability P{U < f(Y)/cg(Y)} = 1/c it follows that the number of iterations 
is geometric with mean c. 

Actually, it is not necessary to generate a new uniform random number when decid- 
ing whether or not to accept, since at a cost of some additional computation, a 
single random number, suitably modified at each iteration, can be used through- 
out. To see how, note that the actual value of U is not used—only whether or not 
U <f(Y)/cg(Y). Hence, if Y is rejected—that is, if U > f(Y)/cg(Y)—we can use 
the fact that, given Y, 


U-f(@)/eg(Y) _ cUg(Y) - f(Y) 
1-f(Y)/eg(Y) eg (Y) —f(Y) 


is uniform on (0,1). Hence, this may be used as a uniform random number in the 
next iteration. As this saves the generation of a random number at the cost of the 
preceding computation, whether it is a net savings depends greatly upon the method 
being used to generate random numbers. 
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Example 11.4 Let us use the rejection method to generate a random variable 
having density function 


f(x) =20x(1-x)°, O<x<1 


Since this random variable (which is beta with parameters 2, 4) is concentrated 
in the interval (0, 1), let us consider the rejection method with 


ge(x)=1, O<x<l 


To determine the constant c such that f(x)/g(x) < c, we use calculus to determine 
the maximum value of 


FON 20x(1 — x)? 
g(x) 


Differentiation of this quantity yields 


d f(x) = 3 yy 
=| 2] =20[a -» 3x(1 — x)*] 


Setting this equal to 0 shows that the maximal value is attained when x = a and 


thus 
f(x) 1\/(3\> 135°. 
o> <20(3)() a ae 


Hence, 
f(x) _ 256 3 
cg(x) 27 AO) 


and thus the rejection procedure is as follows: 

Step 1: Generate random numbers U, and U2. 

Step 2: If Un < =f U,(1 — U1), stop and set X = Uy. Otherwise return to step 1. 
The average number of times that step 1 will be performed is c = ». a 


Example 11.5 (Simulating a Normal Random Variable) To simulate a standard 
normal random variable Z (that is, one with mean 0 and variance 1) note first 
that the absolute value of Z has density function 


el? Qex<c (11.2) 


y 
f(x) = Ba 


Eis 
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We will start by simulating from the preceding density by using the rejection 
method with 


g(x)=e*, O<x<0 


Now, note that 


I) =) aya ousted 


g(x) 


Hence, using the rejection method we can simulate from Equation (11.2) as 
follows: 


(a) Generate independent random variables Y and U, Y being exponential with rate 1 
and U being uniform on (0, 1). 
(b) IfU < exp{—(Y¥ — 1)*/2}, or equivalently, if 


log US = 17/2 


set X = Y. Otherwise return to step (a). 


Once we have simulated a random variable X having Density Function (11.2) we 
can then generate a standard normal random variable Z by letting Z be equally 
likely to be either X or —X. 

To improve upon the foregoing, note first that from Example 11.3 it follows 
that —log U will also be exponential with rate 1. Hence, steps (a) and (b) are 
equivalent to the following: 


(a’) Generate independent exponentials with rate 1, Yj, and Y2. 
(b’) Set X = Y; if Y. > (Y, — 1)?/2. Otherwise return to step (a’). 


Now suppose that we accept step (b’). It then follows by the lack of memory 
property of the exponential that the amount by which Y7 exceeds (Y; — 1)*/2 
will also be exponential with rate 1. 

Hence, summing up, we have the following algorithm which generates an expo- 
nential with rate 1 and an independent standard normal random variable: 


Step 1: Generate Y;, an exponential random variable with rate 1. 

Step 2: Generate Y2, an exponential with rate 1. 

Step 3: If Yy — (Y, — 1)*/2 > 0, set Y = Yy — (Y; — 1)*/2 and go to step 4. Otherwise 
go to step 1. 

Step 4: Generate a random number U and set 


_f 1, ifU< 
Wave Ges 


NIB NS 


The random variables Z and Y generated by the preceding are independent with 
Z being normal with mean 0 and variance 1 and Y being exponential with rate 1. 
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(If we want the normal random variable to have mean jz and variance o%, just 
take uw + oZ.) | 


Remarks 


(i) Since c = ./2e/m © 1.32, the preceding requires a geometric distributed number of 
iterations of step 2 with mean 1.32. 

(ii) The final random number of step 4 need not be separately simulated but rather can 
be obtained from the first digit of any random number used earlier. That is, suppose 
we generate a random number to simulate an exponential; then we can strip off 
the initial digit of this random number and just use the remaining digits (with the 
decimal point moved one step to the right) as the random number. If this initial digit 
is 0, 1, 2, 3, or 4 (or 0 if the computer is generating binary digits), then we take the 
sign of Z to be positive and take it to be negative otherwise. 

(iii) If we are generating a sequence of standard normal random variables, then we can 
use the exponential obtained in step 4 as the initial exponential needed in step 1 
for the next normal to be generated. Hence, on the average, we can simulate a unit 
normal by generating 1.64 exponentials and computing 1.32 squares. 


11.2.3 The Hazard Rate Method 


Let F be a continuous distribution function with F(0) = 1. Recall that A(z), the 
hazard rate function of F, is defined by 


A(t) - fe 

F(t) 
(where f(t) = F’(t) is the density function). Recall also that A(t) represents the 
instantaneous probability intensity that an item having life distribution F will fail 
at time ¢ given it has survived to that time. 

Suppose now that we are given a bounded function A(t), such that fie X(t) 
dt = oo, and we desire to simulate a random variable S having A(£) as its hazard 
rate function. 

To do so let 4 be such that 


A(t) <a for allt > 0 


To simulate from A(t), t > 0, we will 


(a) simulate a Poisson process having rate A. We will then only “accept” or “count” 
certain of these Poisson events. Specifically we will 
(b) count an event that occurs at time ¢, independently of all else, with probability A(t)/A. 


We now have the following proposition. 


Proposition 11.3 The time of the first counted event—call it S—is a random 
variable whose distribution has hazard rate function A(t), ¢ > 0. 
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Proof. 


P{t <S <t+dt\S >t} 
= P{first counted event in (t,¢ + dt)|no counted events prior to t} 
= P{Poisson event in (t,t + dt), it is counted|no counted events prior to t} 
= P{Poisson event in (t,t + dt), it is counted} 
A) 


= [A dt + o(dt)]| —— 


oF A(t) dt + o(dt) 


which completes the proof. Note that the next to last equality follows from the 
independent increment property of Poisson processes. a 


Because the interarrival times of a Poisson process having rate 4 are exponential 
with rate A, it thus follows from Example 11.3 and the previous proposition 
that the following algorithm will generate a random variable having hazard rate 
function A(t), ¢ > 0. 


Hazard Rate Method for Generating S: Xs(t) = A(t) 


Let A be such that A(t) < A for all t > 0. Generate pairs of random variables 
U;, X;,i > 1, with X; being exponential with rate A and U; being uniform (0, 1), 
stopping at 


Ne min Un < (> x,) /a| 
i=1 
Set 
ee | 


To compute E[N] we need the result, known as Wald’s equation, which states 
that if X1, X2,... are independent and identically distributed random variables 
that are observed in sequence up to some random time N then 


N 
ely x,] = E[N]E[X] 
i=1 


More precisely let X1, X2,... denote a sequence of independent random variables 
and consider the following definition. 


Definition 11.1 An integer-valued random variable N is said to be a stopping 
time for the sequence X1,X2,... if the event {N =n} is independent of 
Xnti1,Xn42,--- for alla =1,2,.... 


11.2 General Techniques for Simulating Continuous Random Variables 679 


Intuitively, we observe the X,,s in sequential order and N denotes the number 
observed before stopping. If N = n, then we have stopped after observing 
X1,...,X» and before observing X41, Xn42,-... for alla =1,2,.... 


Example 11.6 Let X,,7 =1,2,..., be independent and such that 


PG SOS Pi Xp aS 4 wa ooo 


If we let 


N= min{n: X, +---+ X, = 10} 


then N is a stopping time. We may regard N as being the stopping time of an 
experiment that successively flips a fair coin and then stops when the number of 
heads reaches 10. a 


Proposition 11.4 (Wald’s Equation) If X ,X2,... are independent and identi- 
cally distributed random variables having finite expectations, and if N is a stop- 
ping time for X1, X,... such that E[N] < oo, then 


N 
=» x,| = E[N]E[X] 
1 


Proof. Letting 


we have 


N ioe) 
> Xn= ys ae 
n=1 n=1 
Hence, 
N ioe) ioe) 
=D x,| = el Xt = ye E[X, In] (11.3) 
n=1 n=1 n=1 


However, I, = 1 if and only if we have not stopped after successively observ- 
ing X1,...,X,-1. Therefore, I, is determined by X1,...,X,—-1 and is thus 
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independent of X,. From Equation (11.3) we thus obtain 


Returning to the hazard rate method, we have 


N 
c= Xx 
i=1 


As N= min{m: Un < (07 Xi)/A} it follows that the event that N = 7 is inde- 
pendent of Xy41, Xn42,.... Hence, by Wald’s equation, 


or 


where E[S] is the mean of the desired random variable. 


11.3 Special Techniques for Simulating Continuous 
Random Variables 


Special techniques have been devised to simulate from most of the common con- 
tinuous distributions. We now present certain of these. 
11.3.1 The Normal Distribution 


Let X and Y denote independent standard normal random variables and thus 
have the joint density function 


1 
f(x,y) = eS —00 <x < 00,-00 <y <0 
1 
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R2= X24 Y? 
© =tan! Y/X 


Figure 11.1 


Consider now the polar coordinates of the point (X, Y). As shown in Figure 11.1, 
RP a XP ¥?, 
© =tan-! Y/X 
To obtain the joint density of R* and ©, consider the transformation 
d=x*+y*, O=tan!y/x 
The Jacobian of this transformation is 


dd ad 


hs et, 2x 2 
] Ox oy Z 
~ )3e° 38) |_— 1 (3) i (=) 
Be. 26s 1+ y?/x? \ x? 1+ y2/x2 \x 
x y 
= ee eo Me 


w+ y2 x2 py2 
Hence, from Section 2.5.3 the joint density of R* and © is given by 


ar 
6) = —e 4/2_ 
fr2,0(d, ) In” 2 
1 a1 


a ae 0<d<w,0<0<27 
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Thus, we can conclude that R* and © are independent with R* having an 
exponential distribution with rate 5 and © being uniform on (0, 27). 

Let us now go in reverse from the polar to the rectangular coordinates. From 
the preceding if we start with W, an exponential random variable with rate 5 
(W plays the role of R*) and with V, independent of W and uniformly distributed 
over (0, 2s) (V plays the role of ©) then X = /W cos V, Y = /Wsin V will be 
independent standard normals. Hence, using the results of Example 11.3 we see 
that if U; and U2 are independent uniform (0, 1) random numbers, then 


X = (—2 log U,)'/* cos), 


(11.4) 
Y = (—2 log U;)!/* sin(2xU2) 


are independent standard normal random variables. 


Remark The fact that X? + Y* has an exponential distribution with rate 5 is 
quite interesting for, by the definition of the chi-square distribution, X* + Y7 
has a chi-squared distribution with two degrees of freedom. Hence, these two 
distributions are identical. 

The preceding approach to generating standard normal random variables is 
called the Box—Muller approach. Its efficiency suffers somewhat from its need 
to compute the preceding sine and cosine values. There is, however, a way to 
get around this potentially time-consuming difficulty. To begin, note that if U 
is uniform on (0,1), then 2U is uniform on (0,2), and so 2U — 1 is uniform 
on (—1, 1). Thus, if we generate random numbers U, and U2 and set 


V, =2U; - 1, 
V2 =2U)-1 


then (V1, V2) is uniformly distributed in the square of area 4 centered at (0, 0) 
(see Figure 11.2). 

Suppose now that we continually generate such pairs (V1, V2) until we obtain 
one that is contained in the circle of radius 1 centered at (0,0)—that is, until 
(V1, V2) is such that Vv? + es < 1. It now follows that such a pair (V1, V2) is 
uniformly distributed in the circle. If we let R, © denote the polar coordinates of 
this pair, then it is easy to verify that R and © are independent, with R* being 
uniformly distributed on (0,1), and © uniformly distributed on (0, 27). 

Since 


_ 7 V. 
sin ® = V>/R = ——2._,, 
JV? + V3 
7 - V 
cosO = Vj/R= i 


11.3 Special Techniques for Simulating Continuous Random Variables 683 


(-1, 1) (1, 1) 


xX? +Y¥? 


Figure 11.2 


it follows from Equation (11.4) that we can generate independent standard nor- 
mals X and Y by generating another random number U and setting 


X = (-2log U)!/7V1/R, 
Y = (—2log U)!/?V2/R 
In fact, since (conditional on Vi + es <1) R? is uniform on (0, 1) and is inde- 


pendent of ©, we can use it instead of generating a new random number U; thus 
showing that 


- = —2 log S 

X = (—2logR*)'/?V1/R = J —" Vi, 
.  [—2logS 

YP Va/R = | V2 


are independent standard normals, where 


Y= (— 2 log R? 


R 2 
S=R*=V?+ V3 


Summing up, we thus have the following approach to generating a pair of 
independent standard normals: 


Step 1: Generate random numbers U, and U2. 
Step 2: Set Vj =2U; —1, V2 =2U2—-1, S= Vi + V3. 
Step 3: IfS > 1, return to step 1. 
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Step 4: Return the independent unit normals 


The preceding is called the polar method. Since the probability that a random 
point in the square will fall within the circle is equal to 2/4 (the area of the 
circle divided by the area of the square), it follows that, on average, the polar 
method will require 4/7 = 1.273 iterations of step 1. Hence, it will, on average, 
require 2.546 random numbers, 1 logarithm, 1 square root, 1 division, and 4.546 
multiplications to generate 2 independent standard normals. 


11.3.2 The Gamma Distribution 


To simulate from a gamma distribution with parameters (7, 4), where 7 is an inte- 
ger, we use the fact that the sum of ” independent exponential random variables 
each having rate 4 has this distribution. Hence, if Uj,..., U, are independent 
uniform (0, 1) random variables, 


isee 1 e 
X= x28 US log (I v 


has the desired distribution. 

When nis large, there are other techniques available that do not require so many 
random numbers. One possibility is to use the rejection procedure with g(x) being 
taken as the density of an exponential random variable with mean n/a (as this 
is the mean of the gamma). It can be shown that for large 7 the average number 
of iterations needed by the rejection algorithm is e[(m — 1)/27]'/*. In addition, 
if we wanted to generate a series of gammas, then, just as in Example 11.4, we 
can arrange things so that upon acceptance we obtain not only a gamma random 
variable but also, for free, an exponential random variable that can then be used 
in obtaining the next gamma (see Exercise 8). 


11.3.3 The Chi-Squared Distribution 


The chi-squared distribution with 1 degrees of freedom is the distribution of 
i= Zi +--+ + Z2 where Z;,i = 1,..., are independent standard normals. 
Using the fact noted in the remark at the end of Section 3.1 we see that 
Z+ + Z%, has an exponential distribution with rate 5. Hence, when n is even— 
say, 1 = 2k— x5, has a gamma distribution with parameters (k, 5). Hence, 


—2 log ( ee Uj) has a chi-squared distribution with 2k degrees of freedom. We 
can simulate a chi-squared random variable with 2k + 1 degrees of freedom by 
first simulating a standard normal random variable Z and then adding Z? to the 
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preceding. That is, 


k 
er = Z* —2log | [| Ui 


where Z, U1,..., U, are independent with Z being a standard normal and the 
others being uniform (0, 1) random variables. 


11.3.4 The Beta (n, m) Distribution 


The random variable X is said to have a beta distribution with parameters n,m 
if its density is given by 


f(x) = 
One approach to simulating from the preceding distribution is to let 
Uy,..-5 Untm—1 be independent uniform (0,1) random variables and consider 
the nth smallest value of this set—call it Ug. Now Ui) will equal x if, of the 
n +m -— 1 variables, 


(i) m—1 are smaller than x, 
(ii) one equals x, 
(iii) 2 — 1 are greater than x. 


Hence, if the 2 + m-—1 uniform random variables are partitioned into three 
subsets of sizes 7 — 1, 1, and m — 1 the probability (density) that each of the 
variables in the first set is less than x, the variable in the second set equals x, and 
all the variables in the third set are greater than x is given by 


(PUU <x)" "fy (P(U > xp™t =x 1d — x! 


Hence, as there are (7 + m—1)!/(m — 1)! — 1)! possible partitions, it follows 
that Uy) is beta with parameters (n,m). 

Thus, one way to simulate from the beta distribution is to find the mth smallest 
of a set of m + m— 1 random numbers. However, when and m are large, this 
procedure is not particularly efficient. 

For another approach consider a Poisson process with rate 1, and recall that 
given Sym, the time of the (7+ 7)th event, the set of the first 1+ — 1 event 
times is distributed independently and uniformly on (0,54). Hence, given 
Sntm; the nth smallest of the first 7 + m—1 event times—that is, S,,—is distributed 
as the nth smallest of a set of 2 + m—1 uniform (0, S,4,) random variables. But 
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from the preceding we can thus conclude that S,/Sy4 has a beta distribution 
with parameters (n,m). Therefore, if Uj,..., Untm are random numbers, 


— log TTj- 1 Ui 


T Tog [1"™" U, is beta with parameters (7, m) 
—log 


By writing the preceding as 


—log[]ji, Ui 
—log Tt Ui—log TVii7 Ui 


we see that it has the same distribution as X/(X + Y) where X and Y are inde- 
pendent gamma random variables with respective parameters (”,1) and (m, 1). 
Hence, when n and m are large, we can efficiently simulate a beta by first simu- 
lating two gamma random variables. 


11.3.5 The Exponential Distribution—The Von Neumann Algorithm 


As we have seen, an exponential random variable with rate 1 can be simulated by 
computing the negative of the logarithm of a random number. Most computer 
programs for computing a logarithm, however, involve a power series expansion, 
and so it might be useful to have at hand a second method that is computationally 
easier. We now present such a method due to Von Neumann. 

To begin let U,, U2,... be independent uniform (0,1) random variables and 
define N,N > 2, by 


N = min{n: U, > U2 > --- > Uyn_1 < Un} 


That is, N is the index of the first random number that is greater than its prede- 
cessor. Let us now compute the joint distribution of N and Uj. 


1 
PIN > n,U; <y) =f PIN > 0, Us < y|U1 =x} de 
0 
y 
=) P{N > n|U, = x} dx 
0 
Now, given that Uj = x,N will be greater than n if x > U2 > --- > U, or, 


equivalently, if 


(a) “UpSexs. VED nagn 
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Now, (a) has probability x”! of occurring and given (a), since all of the (7 — 1)! 
possible rankings of U2,..., U, are equally likely, (b) has probability 1/(m — 1)! 
of occurring. Hence, 


xn 
P{N = x} = —_ 
{N > n|U, = x} @— 1)! 
and so 
y xn y” 
P{N Ui, <yy= dx = 
IN> Ui <= fo dea 
which yields 


P{N =n, U; <y} =P{N > n—-1,U1 <y}-PIN> 1, U1 <9} 


n—1 n 


Aly Ree 
~(n-Dt n 


Upon summing over all the even integers, we see that 


2 3 4 
P{N is even, Uy < y}=y 7 + a mn 
=l-e”’ (11.5) 


We are now ready for the following algorithm for generating an exponential 
random variable with rate 1. 


Step 1: Generate uniform random numbers Uj, U2,... stopping at N = min{m: U; > 
+++ B Un-1 < Un}. 

Step 2: If N is even accept that run, and go to step 3. If N is odd reject the run, and 
return to step 1. 

Step 3: Set X equal to the number of failed runs plus the first random number in the 
successful run. 


To show that X is exponential with rate 1, first note that the probability of a 
successful run is, from Equation (11.5) with y = 1, 


P{N is even} =1—e7! 


Now, in order for X to exceed x, the first [x] runs must all be unsuccessful and 
the next run must either be unsuccessful or be successful but have U; > x — [x] 
(where [x] is the largest integer not exceeding x). As 


P{N even, U; > y} = P{N even} — P{N even, U, < y} 


=t-e'=d=e%=2e%-e1 
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we see that 


P{X >x}=e ble tag COR =e ‘}=e ™ 


which yields the result. 

Let T denote the number of trials needed to generate a successful run. As each 
trial is a success with probability 1—e~! it follows that T is geometric with mean 
1/(1—e7!). If we let N; denote the number of uniform random variables used on 
the ith run, i > 1, then T (being the first run i for which N; is even) is a stopping 
time for this sequence. Hence, by Wald’s equation, the mean number of uniform 
random variables needed by this algorithm is given by 


T 
be N] = E[N]E[T] 
i=1 


Now, 
(oe) 
= > PIN > n} 
n=0 
(oe) 
=14+ DU P(U, > ++. > Un} 
n=1 
CO 
=1+) 1/n=e 
n=1 
and so 


eo] = <= x 4.3 


Hence, this algorithm, which computationally speaking is quite easy to perform, 
requires on the average about 4.3 random numbers to execute. 


11.4 Simulating from Discrete Distributions 
All of the general methods for simulating from continuous distributions have 


analogs in the discrete case. For instance, if we want to simulate a random variable 
X having probability mass function 


PMkSa)= Fy Ja lite Pad 
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we can use the following discrete time analog of the inverse transform technique: 


To simulate X for which P{X = x;} = P; 
let U be uniformly distributed over (0,1), and set 


x1, ifU <P, 
X25 ifPy <U <P} 4+ P2 


j-1 j 
is Gh Pe Ss 
a i 


As, 


j-1 j 
reni=r| Sor Pa Bes ent =P 
1 1 


we see that X has the desired distribution. 


Example 11.7 (The Geometric Distribution) Suppose we want to simulate X 
such that 


PIX=i}=pd—p), i>1 
As 
j-l 


\\ P(X =i} =1- P(X > j-1}=1-C-py! 
i=1 


we can simulate such a random variable by generating a random number U and 
then setting X equal to that value j for which 


1-(1-py'<U<1-(1-py 
or, equivalently, for which 
(l-py<1-U<(1-py! 
As 1 — U has the same distribution as U, we can thus define X by 


log U 
log(1 — p) 


log U 
a ee (eee | 
a — p) 


X = min{j: (1 —p) < Uy =min fi: > 
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As in the continuous case, special simulation techniques have been developed for 
the more common discrete distributions. We now present certain of these. 


Example 11.8 (Simulating a Binomial Random Variable) A binomial (x, p) ran- 
dom variable can be most easily simulated by recalling that it can be expressed 
as the sum of independent Bernoulli random variables. That is, if U1,..., Un 
are independent uniform (0, 1) variables, then letting 


eon(der HEU Sp 
‘10, otherwise 


it follows that X = )77_, X; is a binomial random variable with parameters 1 
and p. 

One difficulty with this procedure is that it requires the generation of n random 
numbers. To show how to reduce the number of random numbers needed, note 
first that this procedure does not use the actual value of a random number U but 
only whether or not it exceeds p. Using this and the result that the conditional 
distribution of U given that U <p is uniform on (0, p) and the conditional dis- 
tribution of U given that U > p is uniform on (p, 1), we now show how we can 
simulate a binomial (7, p) random variable using only a single random number: 


Step 1: Leta=1/p,B =1/(1—p). 

Step 2: Setk=0. 

Step 3: Generate a uniform random number U. 

Step 4: If k =n stop. Otherwise reset k to equal k + 1. 

Step 5: If U < p set X, = 1 and reset U to equal aU. If U > p set X, = 0 and reset U 
to equal 6(U — p). Return to step 4. 


This procedure generates X1,...,X, and X = )~"_, X; is the desired random 
variable. It works by noting whether U, < p or Ug > p; in the former case it 
takes U,,1 to equal U;/p, and in the latter case it takes Uz, to equal (Uz — p)/ 
(1—p).t | 


Example 11.9 (Simulating a Poisson Random Variable) To simulate a Poisson 
random variable with mean A, generate independent uniform (0, 1) random vari- 
ables U1, U2,... stopping at 


n 
N+1= min | I] Uj; < } 
i=1 


The random variable N has the desired distribution, which can be seen by noting 
that 


N= max : y > = log U; < if 
i=1 


tBecause of computer round-off errors, a single random number should not be continuously used 
when z is large. 
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But — log U; is exponential with rate 1, and so if we interpret — log Uj, i > 1, as 
the interarrival times of a Poisson process having rate 1, we see that N = N(A) 
would equal the number of events by time 4. Hence N is Poisson with mean A. 

When A is large we can reduce the amount of computation in the preceding 
simulation of N(A), the number of events by time 4 of a Poisson process having 
rate 1, by first choosing an integer m and simulating S,,,, the time of the mth event 
of the Poisson process, and then simulating N(A) according to the conditional 
distribution of N(A) given S,,. Now the conditional distribution of N(A) given 
Sm is as follows: 


NQ)|Sm =s~m-+Poisson(A—s), ifs <A 


Xr 
NQ)|Sn =s~ Binomial (m —1, *), ifs >A 
s 


where ~ means “has the distribution of.” This follows since if the mth event 
occurs at time s, where s < A, then the number of events by time A is m plus 
the number of events in (s,A). On the other hand given that S,, = s the set of 
times at which the first  — 1 events occur has the same distribution as a set of 
m — 1 uniform (0,s) random variables (see Section 5.3.5). Hence, when A < s, 
the number of these that occur by time A is binomial with parameters m— 1 and 
A/s. Hence, we can simulate N(A) by first simulating S,, and then simulating, 
either P(A — S,,), a Poisson random variable with mean A — S,,, when S,, < A, or 
simulating Bin(m — 1, A/S,,), a binomial random variable with parameters m— 1 
and A/Sm, when S,, > 4; and then setting 


COs eens if Sm <A 
Bing —1,A/Sm), if Sn > A 
In the preceding it has been found computationally effective to let m be approxi- 
mately a Of course, S,, is simulated by simulating from a gamma (m, A) dis- 
tribution via an approach that is computationally fast when m is large (see 
Section 11.3.3). | 


There are also rejection and hazard rate methods for discrete distributions but 
we leave their development as exercises. However, there is a technique available 
for simulating finite discrete random variables—called the alias method—which, 
though requiring some setup time, is very fast to implement. 


11.4.1 The Alias Method 


In what follows, the quantities P, P®, Q®, k < n —1 will represent probability 
mass functions on the integers 1,2,...,#—that is, they will be 7-vectors of non- 
negative numbers summing to 1. In addition, the vector P) will have at most k 
nonzero components, and each of the Q“) will have at most two nonzero com- 
ponents. We show that any probability mass function P can be represented as 
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an equally weighted mixture of 7 — 1 probability mass functions Q (each having 
at most two nonzero components). That is, we show that for suitably defined 
QM,...,Q”-), P can be expressed as 


n-1 

1 

Pa 8” (11.6) 
k=1 


As a prelude to presenting the method for obtaining this representation, we will 
need the following simple lemma whose proof is left as an exercise. 


Lemma 11.5 Let P = {P;, i=1,...,} denote a probability mass function, 
then 


(a) there exists an i, 1 <i <n, such that P; < 1/(n— 1), and 
(b) for this 7, there exists a j, 7 4 i, such that P; + P; > 1/(n— 1). 

Before presenting the general technique for obtaining the representation of 
Equation (11.6), let us illustrate it by an example. 


Example 11.10 Consider the three-point distribution P with P; = > Py = 7 
P3 = is: We start by choosing i and j such that they satisfy the conditions of 
Lemma 11.5. As P3 < 5 and P3 + Pz > 7 we can work with i = 3 andj = 2. 
We will now define a two-point mass function Q™ putting all of its weight on 3 
and 2 and such that P will be expressible as an equally weighted mixture between 
Q™ and a second two-point mass function Q?). Secondly, all of the mass of 
point 3 will be contained in Q™. As we will have 


1 ; 
Pj = 5 (Q;” +0), j=1,2,3 (11.7) 


and, by the preceding, oY is supposed to equal 0, we must therefore take 


1 


7 
(1) @) () _ (1) _ 
OM =2P;=5, Of =1-OM ==, Of =0 
To satisfy Equation (11.7), we must then set 

7 1 7 


Hence, we have the desired representation in this case. Suppose now that the 
original distribution was the following four-point mass function: 
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Now, P3 < 3 and P3 + Py > ;- Hence our initial two-point mass function— 


Q—will concentrate on points 3 and 1 (giving no weights to 2 and 4). As the 
final representation will give weight 3 to Q™ and in addition the other Q, 
j = 2,3, will not give any mass to the value 3, we must have 


1 a) 1 

= oP 

323 = P= 3 
Hence, 


3 


() () 3 
QO; = 3 OQ; ey gg 


Also, we can write 


1 2 
P= _—oOW + Zp®) 
32 = 3 


where P®), to satisfy the preceding, must be the vector 


3 3 
By = re ae, 


8 
PS =0, 
3 9 
Ge he 
By 5h = 35 


Note that P®) gives no mass to the value 3. We can now express the mass function 
P®) as an equally weighted mixture of two-point mass functions Q® and Q®), 
and we will end up with 


1 2/1 1 
p-t9%42(109@ 4 1o® 
30° +3 (52 +58 
1 
— ~(—(® (2) (3) 
ge Oras) 
(We leave it as an exercise for you to fill in the details.) a 


The preceding example outlines the following general procedure for writing 
the n-point mass function P in the form of Equation (11.6) where each of the 
Q® are mass functions giving all their mass to at most two points. To start, we 
choose i and ; satisfying the conditions of Lemma 11.5. We now define the mass 
function Q™ concentrating on the points i and j and which will contain all of the 
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mass for point i by noting that, in the representation of Equation (11.6), oF =0 
for k = 2,...,n—1, implying that 


ow =(n—1)P;, andso Oo? =1-(—-1)P; 
Writing 


2 
P= —Q” + 


a= ae 1) (11.8) 


where P"-) represents the remaining mass, we see that 


p"-) —0, 

(n—1) n—1 1 (1) n—-1 1 

as =2=5 (7, fat) ee ae 
—1 n—1 : : 

ag = Ps k#iorj 


That the foregoing is indeed a probability mass function is easily checked—for 
instance, the nonnegativity of pe follows from the fact that ; was chosen so 
that P; + P; 21/(n—1). 

We may now repeat the foregoing procedure on the (7 — 1)-point probability 
mass function P’~) to obtain 


pom = + 9? +2 Bo Spt ~2) 


and thus from Equation (11.8) we have 


P= 9” + - 9? 4" 1 = 3 pn2) 
n— -—1 


We now repeat the procedure on P~?) and so on until we finally obtain 


1 


Pra 


Q® noe +Q*) 


In this way we are able to represent P as an equally weighted mixture of m — 1 
two-point mass functions. We can now easily simulate from P by first generating 
a random integer N equally likely to be either 1,2,..., or 7 — 1. If the resulting 
value N is such that Q™? puts positive weight only on the points ix and jx, then 
we can set X equal to in if a second random number is less than oF and equal 
to jn otherwise. The random variable X will have probability mass function P. 
That is, we have the following procedure for simulating from P: 
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Step 1: Generate U; and set N= 14+ [(2— 1)U1]. 
Step 2: Generate U2 and set 


Shes Nias, ie Os et Oe 
jn, otherwise 


Remarks 


(i) The preceding is called the alias method because by a renumbering of the Qs we can 
always arrange things so that for each k, OW > 0. (That is, we can arrange things 
so that the kth two-point mass function gives positive weight to the value k.) Hence, 
the procedure calls for simulating N, equally likely to be 1,2,..., orm — 1, and then 
if N = k it either accepts k as the value of X, or it accepts for the value of X the 
“alias” of k (namely, the other value that Q“) gives positive weight). 

(ii) Actually, it is not necessary to generate a new random number in step 2. Because 
N-—1 is the integer part of (7 — 1) Uj, it follows that the remainder (7 — 1)U; — (N— 
1) is independent of U; and is uniformly distributed in (0,1). Hence, rather than 
generating a new random number U) in step 2, we can use (2 — 1)U, — (N— 1) = 
(1 — 1)U; —[(#— 1)Uj]. 


Example 11.11 Let us return to the problem of Example 11.1, which considers 
a list of m, not necessarily distinct, items. Each item has a value—v(i) being the 
value of the item in position i—and we are interested in estimating 


n 


v=) v(i)/m@) 


i=1 


where m(i) is the number of times the item in position i appears on the list. In 
words, v is the sum of the values of the (distinct) items on the list. 
To estimate v, note that if X is a random variable such that 


PX =i} =v) / v0), eee 
1 


then 


Yj v@/m@ ~ 
E(1/m(X)] = “=o | SY 0(j) 
yu” ge 


Hence, we can estimate v by using the alias (or any other) method to generate 
independent random variables X1,..., X, having the same distribution as X and 
then estimating v by 


n k 
1 
vse De 1/m(X;) : 
I= aa 
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11.5 Stochastic Processes 


We can easily simulate a stochastic process by simulating a sequence of random 
variables. For instance, to simulate the first t time units of a renewal process 
having interarrival distribution F we can simulate independent random variables 
X1, X2,... having distribution F, stopping at 


N =min{m: Xj +---+ X, >t} 


The X;,i > 1, represent the interarrival times of the renewal process and so the 
preceding simulation yields N — 1 events by time tthe events occurring at times 
X1,X1 + X2,...,X1 +--+ XN-1. 

Actually there is another approach for simulating a Poisson process that is quite 
efficient. Suppose we want to simulate the first ¢ time units of a Poisson process 
having rate 4. To do so, we can first simulate N(t), the number of events by f, 
and then use the result that given the value of N(£), the set of N(t) event times 
is distributed as a set of 1 independent uniform (0, t) random variables. Hence, 
we start by simulating N(z), a Poisson random variable with mean At (by one of 
the methods given in Example 11.9). Then, if N(¢) = 1, generate a new set of 
random numbers—call them Uj,..., U,—and {tU;,...,tU,} will represent the 
set of N(t) event times. If we could stop here this would be much more efficient 
than simulating the exponentially distributed interarrival times. However, we 
usually desire the event times in increasing order—for instance, for s < f, 


N(s) = number of U; : tU; < s 


and so to compute the function N(s),s < t, it is best to first order the values 
U;,i = 1,...,2 before multiplying by t. However, in doing so you should not 
use an all-purpose sorting algorithm, such as quick sort (see Example 3.14), but 
rather one that takes into account that the elements to be sorted come from a 
uniform (0, 1) population. Such a sorting algorithm of 7 uniform (0, 1) variables 
is as follows: Rather than a single list to be sorted of length 7 we will consider n 
ordered, or linked, lists of random size. The value U will be put in list 7 if its value 
is between (i—1)/2 and i/n—that is, U is put in list [~U] + 1. The individual lists 
are then ordered, and the total linkage of all the lists is the desired ordering. As 
almost all of the lists will be of relatively small size (for instance, if 7 = 1000 the 
mean number of lists of size greater than 4 is (using the Poisson approximation 
to the binomial) approximately equal to 1000(1 — § eI) ~ 4) the sorting of 
individual lists will be quite quick, and so the running time of such an algorithm 
will be proportional to 7 (rather than to 7 log 7 as in the best all-purpose sorting 
algorithms). 

An extremely important counting process for modeling purposes is the non- 
homogeneous Poisson process, which relaxes the Poisson process assumption of 
stationary increments. Thus it allows for the possibility that the arrival rate need 
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not be constant but can vary with time. However, there are few analytical studies 
that assume a nonhomogeneous Poisson arrival process for the simple reason that 
such models are not usually mathematically tractable. (For example, there is no 
known expression for the average customer delay in the single-server exponen- 
tial service distribution queueing model that assumes a nonhomogeneous arrival 
process.)* Clearly such models are strong candidates for simulation studies. 


11.5.1 Simulating a Nonhomogeneous Poisson Process 


We now present three methods for simulating a nonhomogeneous Poisson process 
having intensity function A(t), 0 < t < co. 


Method 1. Sampling a Poisson Process 


To simulate the first T time units of a nonhomogeneous Poisson process with 
intensity function A(z), let A be such that 


A(t) <a forallt < T 


Now, as shown in Chapter 5, such a nonhomogeneous Poisson process can be 
generated by a random selection of the event times of a Poisson process having 
rate A. That is, if an event of a Poisson process with rate 4 that occurs at time 
t is counted (independently of what has transpired previously) with probability 
X(t)/A then the process of counted events is a nonhomogeneous Poisson process 
with intensity function A(t),0 < t < T. Hence, by simulating a Poisson process 
and then randomly counting its events, we can generate the desired nonhomoge- 
neous Poisson process. We thus have the following procedure: 

Generate independent random variables 1, U1, X2, U2,... where the X; are 
exponential with rate 4 and the U; are random numbers, stopping at 


n 
N= min | Y° Xx; > r} 
i=1 
Now let, for j = 1,...,N—1, 


1, if Uj < a(S Xi)/a 


i= 
0, otherwise 
and set 
JH: j= 13 


+One queueing model that assumes a nonhomogeneous Poisson arrival process and is mathematically 
tractable is the infinite server model. 
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Thus, the counting process having events at the set of times {)“._, Xi: j € J} 
constitutes the desired process. 

The foregoing procedure, referred to as the thinning algorithm (because it 
“thins” the homogeneous Poisson points) will clearly be most efficient, in the 
sense of having the fewest number of rejected event times, when A(t) is near A 
throughout the interval. Thus, an obvious improvement is to break up the inter- 
val into subintervals and then use the procedure over each subinterval. That is, 
determine appropriate values k, 0 < ty < ty < ++: < tg < T,Aq,...,Apid, 
such that 


A(s) <A; when tj-1 <s <#tj,i=1,...,R+1 (where to = 0, tp,, = T) 
(11.9) 


Now simulate the nonhomogeneous Poisson process over the interval (¢;-1, t;) by 
generating exponential random variables with rate 4; and accepting the gener- 
ated event occurring at time s, s € (t;_1, t;), with probability A(s)/A;. Because of 
the memoryless property of the exponential and the fact that the rate of an expo- 
nential can be changed upon multiplication by a constant, it follows that there is 
no loss of efficiency in going from one subinterval to the next. In other words, if 
we are at t € [t;_1, t;) and generate X, an exponential with rate 4;, which is such 
that t + X > t; then we can use A,[X — (t; — t)]/Aj41 as the next exponential with 
rate 4;41. Thus, we have the following algorithm for generating the first t time 
units of anonhomogeneous Poisson process with intensity function A(s) when the 
relations (11.9) are satisfied. In the algorithm, t will represent the present time 
and I the present interval (that is, ] = i when tj-1 < t < tj). 


Step 1: t=0,I=1. 

Step 2: Generate an exponential random variable X having rate 47. 

Step 3: Ift + X < ty, reset t = t + X, generate a random number U, and accept the 
event time t if U < A(t)/A;. Return to step 2. 

Step 4: (Step reached if t + X > ty). Stop if 1 =k + 1. Otherwise, reset X = (X — t, + 
t)Ay/AJ41. Also reset ¢ = t; and I =I 4+ 1, and go to step 3. 


Suppose now that over some subinterval (¢;_1, t;) it follows that A; > 0 where 
A; = infimum {A(s): ti-1 < s < ti} 


In such a situation, we should not use the thinning algorithm directly but rather 
should first simulate a Poisson process with rate 4; over the desired interval and 
then simulate a nonhomogeneous Poisson process with the intensity function 
As) = A(s) — A; when s € (4-1, ¢;). (The final exponential generated for the 
Poisson process, which carries one beyond the desired boundary, need not be 
wasted but can be suitably transformed so as to be reusable.) The superposition 
(or, merging) of the two processes yields the desired process over the interval. The 
reason for doing it this way is that it saves the need to generate uniform random 
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variables for a Poisson distributed number, with mean d,(¢; — t;-1) of the event 
times. For instance, consider the case where 


A(s) =10+4+5, O<s<l 


Using the thinning method with 4 = 11 would generate an expected number of 
11 events each of which would require a random number to determine whether 
or not to accept it. On the other hand, to generate a Poisson process with rate 10 
and then merge it with a generated nonhomogeneous Poisson process with rate 
X(s) = s, 0 < s < 1, would yield an equally distributed number of event times but 
with the expected number needing to be checked to determine acceptance being 
equal to 1. 

Another way to make the simulation of nonhomogeneous Poisson processes 
more efficient is to make use of superpositions. For instance, consider the process 
where 


exp {t7}, Ob =15 
A(t) = { exp{2.25}, 1.5<t<2.5 
exp{(4—1)7}, 2.5 <t<4 


A plot of this intensity function is given in Figure 11.3. One way of simulating 
this process up to time 4 is to first generate a Poisson process with rate 1 over 
this interval; then generate a Poisson process with rate e — 1 over this interval, 
accept all events in (1, 3), and only accept an event at time f that is not contained 
in (1, 3) with probability [A(@) — 1]/(e — 1); then generate a Poisson process with 


Rates A(t) 


Figure 11.3 
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rate e”° —e over the interval (1, 3), accepting all event times between 1.5 and 2.5 


and any event time ft outside this interval with probability [A(t) — e]/(e*> — e). 
The superposition of these processes is the desired nonhomogeneous Poisson 
process. In other words, what we have done is to break up A(£) into the following 
nonnegative parts: 


A(t) = Ai (t) +A2(t) + A380), O<t<4 
where 
MG) = 1, 


At)—-1, O<t<1 
A2(t) = ye—1, 1<t<3 
At-1, 3<t<4 


A(t) — e, 1<t<1.5 
ee 1.5 <t<2.5 
A(t) — e, 2.5<t<3 
0, 3<t<4 


A3(t) = 


and where the thinning algorithm (with a single interval in each case) was used 
to simulate the constituent nonhomogeneous processes. 


Method 2. Conditional Distribution of the Arrival Times 


Recall the result for a Poisson process having rate 4 that given the number of 
events by time T the set of event times are independent and identically dis- 
tributed uniform (0, T) random variables. Now suppose that each of these events 
is independently counted with a probability that is equal to A(t) /4 when the event 
occurred at time t. Hence, given the number of counted events, it follows that the 
set of times of these counted events are independent with a common distribution 
given by F(s), where 


F(s) = P{time < s|counted} 


__ P{time < s, counted} 
~ P{counted} 


a de P{time < s, counted|time = x} dx/T 
~ P{counted} 


_ Jo A(x) dx 
7 Jo Ax) dx 
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The preceding (somewhat heuristic) argument thus shows that given 1 events of 
a nonhomogeneous Poisson process by time T the 1 event times are independent 
with a common density function 


X(s) 


f(s) = m(T)’ 


ae 
O0<s<T, m(T)= i; A(s) ds (11.10) 
0 


Since N(T), the number of events by time T, is Poisson distributed with mean 
m(T), we can simulate the nonhomogeneous Poisson process by first simulating 
N(T) and then simulating N(T) random variables from the density function 
of (11.10). 


Example 11.12 If A(s) = cs, then we can simulate the first T time units of the 
nonhomogeneous Poisson process by first simulating N(T), a Poisson random 
variable having mean m(T) = i cs ds = CT? /2, and then simulating N(T) ran- 
dom variables having distribution 


2 


F(s) = T2? 


0O<s<T 


Random variables having the preceding distribution either can be simulated by 
use of the inverse transform method (since F~!(U) = TVU) or by noting that F 
is the distribution function of max(TU1, TU2) when U; and U2 are independent 
random numbers. a 


If the distribution function specified by Equation (11.10) is not easily invertible, 
we can always simulate from (11.10) by using the rejection method where we 
either accept or reject simulated values of uniform (0, T) random variables. That 
is, let h(s) = 1/T,0 <s < T. Then 


f(s) _ TAs) e Ss a 
h(s)  m(T) ~ m(T) 


where A is a bound on A(s), 0 < s < T. Hence, the rejection method is to generate 
random numbers U; and U2 then accept TU, if 


f(TU;) 


Opa eee 
2 CHE) 


or, equivalently, if 


< A(TU}) 


U 
: x 
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Method 3. Simulating the Event Times 


The third method we shall present for simulating a nonhomogeneous Poisson pro- 
cess having intensity function A(t), t > 0 is probably the most basic approach— 
namely, to simulate the successive event times. So let X1, X2,... denote the event 
times of such a process. As these random variables are dependent we will use the 
conditional distribution approach to simulation. Hence, we need the conditional 
distribution of X; given X1,..., Xj-1. 

To start, note that if an event occurs at time x then, independent of what has 
occurred prior to x, the time until the next event has the distribution F,, given by 


F,.(t) = P{0 events in (x,x + t)|event at x} 
= P{0 events in (x,x + t)} by independent increments 


t 
= exp| - [ A(x + 9) ay| 


Differentiation yields that the density corresponding to F, is 


t 


fet) = A(x 4+ 2) exp f A(x + y) ay| 
0 


implying that the hazard rate function of F, is 


1x (£) = fx(@) = A(x +t) 


F(t) 


We can now simulate the event times X1, X2,... by simulating X1 from Fo; then 
if the simulated value of X1 is x1, simulate Xz by adding x, to a value generated 
from F,,, and if this sum is x2 simulate X3 by adding x2 to a value generated 
from Fx,, and so on. The method used to simulate from these distributions should 
depend, of course, on the form of these distributions. However, it is interesting 
to note that if we let 4 be such that A(t) < A and use the hazard rate method to 
simulate, then we end up with the approach of Method 1 (we leave the verification 
of this fact as an exercise). Sometimes, however, the distributions F, can be easily 
inverted and so the inverse transform method can be applied. 


Example 11.13 Suppose that A(x) = 1/(« + a),x > 0. Then 


t 
i A(x + y)dy = log(*=2=*) 
0 x 


Hence, 
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and so 
u 
1l-—u 


Fy 1(u) = (x +.) 


We can, therefore, simulate the successive event times X1, X2,... by generating 
Uy, U2,... and then setting 


au, 
xX1= 
1 1 Ue 
U2 
X27 = (X x 
a= ee) 1 
and, in general, 
U; ; 
Xj = (Xj-1 + a) Meas Fee a 
tau; 


11.5.2 Simulating a Two-Dimensional Poisson Process 


A point process consisting of randomly occurring points in the plane is said to 
be a two-dimensional Poisson process having rate A if 


(a) the number of points in any given region of area A is Poisson distributed with mean 
AA; and 
(b) the numbers of points in disjoint regions are independent. 


For a given fixed point O in the plane, we now show how to simulate events 
occurring according to a two-dimensional Poisson process with rate 4 ina circular 
region of radius r centered about O. Let R;,i > 1, denote the distance between 
O and its ith nearest Poisson point, and let C(a) denote the circle of radius a 
centered at O. Then 


P{xRt > b} = PAR = 2 = P{no points in o(vb/x)| aap 
Also, with C(az) — C(a1) denoting the region between C(a2) and C(a}): 


P{wR3 — 7Ri > b|Ri = 7} 


= P{Ro > (b+ n1)/n|Ri = r| 
= P{no points in c(ye + m1?) /1) — C(n|Ri = r| 
= P{no points in c(ye ie m1?) /1) = co] by (b) 


_ p-ab 


In fact, the same argument can be repeated to obtain the following. 
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Proposition 11.6 With Ro = 0, 
mR? - BRE as i>1, 


are independent exponentials with rate A. 


In other words, the amount of area that needs to be traversed to encompass a 
Poisson point is exponential with rate 4. Since, by symmetry, the respective angles 
of the Poisson points are independent and uniformly distributed over (0,27), 
we thus have the following algorithm for simulating the Poisson process over a 
circular region of radius r about O: 


Step 1: Generate independent exponentials with rate 1, X;, X2,..., stopping at 


xX +t X 
N= min {rn “At ol 


An 


Step 2: If N = 1, stop. There are no points in C(r). Otherwise, for i = 1,...,N — 1, set 


Rp= VX +--+ Xp)/rn 


Step 3: Generate independent uniform (0, 1) random variables U,,..., UN_1. 
Step 4: Return the N — 1 Poisson points in C(r) whose polar coordinates are 


(Rj,2xU;), i=1,...,.N—-1 


The preceding algorithm requires, on average, 1 + Ar* exponentials and an 
equal number of uniform random numbers. Another approach to simulating 
points in C(r) is to first simulate N, the number of such points, and then use 
the fact that, given N, the points are uniformly distributed in C(r). This latter 
procedure requires the simulation of N, a Poisson random variable with mean 
dar’; we must then simulate N uniform points on C(r), by simulating R from the 
distribution Fr (a) = a*/r* (see Exercise 25) and 6 from uniform (0, 2) and must 
then sort these N uniform values in increasing order of R. The main advantage 
of the first procedure is that it eliminates the need to sort. 

The preceding algorithm can be thought of as the fanning out of a circle cen- 
tered at O with a radius that expands continuously from 0 to r. The successive 
radii at which Poisson points are encountered is simulated by noting that the 
additional area necessary to encompass a Poisson point is always, independent 
of the past, exponential with rate A. This technique can be used to simulate the 
process over noncircular regions. For instance, consider a nonnegative function 
g(x), and suppose we are interested in simulating the Poisson process in the region 
between the x-axis and g with x going from 0 to T (see Figure 11.4). To do so we 
can start at the left-hand end and fan vertically to the right by considering the suc- 
cessive areas is g(x) dx. Now if X, < X2 < --- denote the successive projections 
of the Poisson points on the x-axis, then analogous to Proposition 11.6, it will 


follow that (with Xo = 0) A i f 8%) dx,i > 1, will be independent exponentials 
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i aa 


Figure 11.4 


with rate 1. Hence, we should simulate €1, €2, ..., independent exponentials with 
rate 1, stopping at 


T 
N= miner +o en >a f s(x) dx} 
0 


and determine X1,..., XN_1 by 
X1 
af g(x) dx = «1, 
0 


X2 
af g(x) dx = €2, 


XxX 


XN-1 
af g(x) dx = €n-1 


XNn-2 


If we now simulate U;,..., Un—1—independent uniform (0,1) random num- 
bers—then as the projection on the y-axis of the Poisson point whose x-coordinate 
is X; is uniform on (0, g(X;)), it follows that the simulated Poisson points in the 
interval are (X;, Ujg(X;)),i=1,...,N—1. 

Of course, the preceding technique is most useful when g is regular enough so 
that the foregoing equations can be solved for the X;. For instance, if g(x) = y 
(and so the region of interest is a rectangle), then 

Dae ee Pe eT 
Ay 


and the Poisson points are 


(X;,yU;), i=1,...,N-1 
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11.6 Variance Reduction Techniques 


Let X1,..., X, have a given joint distribution, and suppose we are interested in 
computing 


6 = Elg(X4,...,Xn)] 


where g is some specified function. It is often the case that it is not possible to 
analytically compute the preceding, and when such is the case we can attempt 
to use simulation to estimate 6. This is done as follows: Generate x ga 
having the same joint distribution as X1,..., X,, and set 


Y1 = g(X{,...,X®) 


Now, simulate a second set of random variables (independent of the first set) 
yee ag x? having the distribution of X1,...,X, and set 


Y = g(X”,...,X) 


Continue this until you have generated k (some predetermined number) sets, 
and so have also computed Y1, Y,..., Yz. Now, Y1,..., Yz are independent and 
identically distributed random variables each having the same distribution of 
g(X1,..., Xn). Thus, if we let Y denote the average of these k random variables— 
that is, 


k 
Y= Yr 
i=1 
then 


E[Y] = 9, 


E[(Y — 6)7] = Var(Y) 


Hence, we can use Y as an estimate of 6. As the expected square of the difference 
between Y and @ is equal to the variance of Y, we would like this quantity to be 
as small as possible. In the preceding situation, Var(Y) = Var(Y;)/k, which is 
usually not known in advance but must be estimated from the generated values 
Y1,.--; Yn. We now present three general techniques for reducing the variance 
of our estimator. 
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11.6.1 Use of Antithetic Variables 


In the preceding situation, suppose that we have generated Y; and Y2, identically 
distributed random variables having mean 6. Now, 


Y,+ Y: 1 
Var( 15") = qlVar(¥1) + Var(Y2) + 2 Cov(Y1, Y2)] 
_ Var(Y1) ie Cov(Y1, Y2) 

~ 2 2 


Hence, it would be advantageous (in the sense that the variance would be reduced) 
if Y; and Y> rather than being independent were negatively correlated. To see 
how we could arrange this, let us suppose that the random variables X1,..., X), 
are independent and, in addition, that each is simulated via the inverse transform 
technique. That is, X; is simulated from F-'(Uj) where U; is a random number 
and F; is the distribution of X;. Hence, Y; can be expressed as 


We g(Fr Uv, ae Fy (Un) 


Now, since 1 — U is also uniform over (0,1) whenever U is a random number 
(and is negatively correlated with U) it follows that Y2 defined by 


Yo = g(F;1(1 — U)),...,F; (1 — Un)) 


will have the same distribution as Y;. Hence, if Yj and Y2 were negatively cor- 
related, then generating Y2 by this means would lead to a smaller variance than 
if it were generated by a new set of random numbers. (In addition, there is a 
computational savings since rather than having to generate 1 additional random 
numbers, we need only subtract each of the previous 7 from 1.) The following 
theorem will be the key to showing that this technique—known as the use of anti- 
thetic variables—will lead to a reduction in variance whenever g is a monotone 
function. 


Theorem 11.1 If X1,..., X,, are independent, then, for any increasing functions 
f and g of 1 variables, 


Elf (X)g(X)] > Elf COIETg(X)] (11.11) 


where X = (X1,..., Xn). 


Proof. The proof is by induction on n. To prove it when n = 1, let f and g be 
increasing functions of a single variable. Then, for any x and y, 


(F(x) — FO) (ge) — gv) 2 0 
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since if x > y (x < y) then both factors are nonnegative (nonpositive). Hence, 
for any random variables X and Y, 


(F(X) — F(Y))(g(X) — g(Y)) > 0 
implying that 
EL(f(X) — f(Y))(g(X) — g(Y))] = 0 
or, equivalently, 
E{f(X)g(X)] + E[f(Y)g(Y)] > Elf Og(VY)] + Ef(Y)g(X)] 


If we suppose that X and Y are independent and identically distributed, as in this 
case, then 


Elf (X)g(X)] = E[f(Y)g(Y)I, 
Eff (X)g(Y)] = Elf(Y)g(X)] = E[fCOVElg(X)] 
and so we obtain the result when 7 = 1. 


So assume that (11.11) holds for n—1 variables, and now suppose that 
X1,...,Xy are independent and f and g are increasing functions. Then 


Elf (X)g(X)|Xn = xn] 
= Elf (X1,...,Xn—15%n)g(X1, «+s Xn—15 Xn) |Xn = x] 
= Eff (X1,..., Xn—-1,*n)g(X1,-..,Xn—1,Xn)] by independence 
E[f(X1,..+5Xn—1)%n)JE[g(X1,--.,Xn-1,Xn)] 
by the induction hypothesis 
= Eff (X)|Xn = xnJElg(X)|Xn = Xn] 


Hence, 
Elf (X)g(X)|Xn] > Elf X)|XnlJEIg(X)| Xn] 


and, upon taking expectations of both sides, 


Elf (X)g(X)] > ELELf (X)| XnlElg(X)|Xn]] 


E[f X)JE[g(X)] 


WV W 


The last inequality follows because E[f (X)|X,] and E[g(X)|X,] are both increas- 
ing functions of X,, and so, by the result for n = 1, 


E[EIf (X)|XnlE[g(X)|Xnl] > ELEIFOOlXnE[ElgX)|Xnl] 
= Elf (X)JE[g(X)] a 
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Corollary 11.7 If U;,...,U, are independent, and & is either an increasing or 
decreasing function, then 


Covlh Cixi Uorkd= Up ATU) <0 


Proof. Suppose & is increasing. As —k(1 — U;1,...,1 — U,) is increasing in 
U,,..., Un, then, from Theorem 11.1, 


Cov(k(U},..-, Un), -k — Uj,...,1— Un)) > 0 


When k is decreasing just replace k by its negative. a 


Since F,'(U)) is increasing in U; (as F;, being a distribution function, is 
increasing) it follows that g(Fy'(U1), as Uy) is a monotone function of 
U,,..., U, whenever g is monotone. Hence, if g is monotone the antithetic vari- 
able approach of twice using each set of random numbers Uj,..., U, by first 
computing g(F,'(U}),...,F,'(Un)) and then g(Fy7'(1 — Uy),...,F, 11 — Un)) 
will reduce the variance of the estimate of E[g(X1,..., X)]. That is, rather than 
generating k sets of m random numbers, we should generate k/2 sets and use each 
set twice. 


Example 11.14 (Simulating the Reliability Function) Consider a system of n 
components in which component i, independently of other components, works 
with probability p;, i= 1,...,7. Letting 


1, if component i works 
Xj, = : 
0, otherwise 


suppose there is a monotone structure function ¢ such that 


1, if the system works under X1,..., Xn 
0, otherwise 


P(X. .+,Xn) = | 
We are interested in using simulation to estimate 


1(P15-++5Dn) = El@(Xq,..., Xn)] = P{@(X1,..., Xn) = 1} 


Now, we can simulate the X; by generating uniform random numbers Uj,..., Uy 
and then setting 


1, if Uj < Di 
oi i otherwise 


Hence, we see that 


o(X1,...,Xn) = R(U4,..., Un) 
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where k is a decreasing function of Uj,..., Un. Hence, 
Cov(k(U), k — U)) <0 


and so the antithetic variable approach of using U;,...,U, to generate both 
k(U,,..., Un) and k(1 — Uj,...,1 — U,) results in a smaller variance than if an 
independent set of random numbers was used to generate the second k. a 


Example 11.15 (Simulating a Queueing System) Consider a given queueing 
system, let D; denote the delay in queue of the ith arriving customer, and suppose 
we are interested in simulating the system so as to estimate 


= E[D, + ---+ Dy] 


Let X1,...,X, denote the first 1 interarrival times and S;,...,S, the first n 
service times of this system, and suppose these random variables are all inde- 
pendent. Now in most systems D; + --- + D, will be a function of Xy,..., Xn, 
S1,..., Sy—Say, 


Dy +--+ + Dy = 2(X1,.-.-5 Xn, S1,---5Sn) 


Also, g will usually be increasing in S; and decreasing in X;,i = 1,...,n. If 
we use the inverse transform method to simulate X;,5;,i = 1,...,2—say, 
Xj= Fe 1a -U),S; = Gy '(U;) where Uj,..., Un, Ui,..., Un are independent 
uniform random numbers—then we may write 


Dp eee = RC istacs Ua ices Up) 


where k is increasing in its variates. Hence, the antithetic variable approach 
will reduce the variance of the estimator of 6. (Thus, we would generate Uj, Uj, 
4 ="1,..5, and set Xp = Fo '(1 — U;) and Y; = G,!(U,) for the first run, and 


Ki F '(U,) and Y; = G;! (1 — U;) for the ssn) As all the U; and U; are 
independent, however, this is equivalent to setting X; = F-1(U), Yy;= G7!) 
in the first run and using 1 — U; for U; and 1 — U; for U; in the second. | 


11.6.2 Variance Reduction by Conditioning 


Let us start by recalling (see Proposition 3.1) the conditional variance formula 
Var(Y) = E[Var(Y|Z)] + Var(E[Y|Z]) (11.12) 
Now suppose we are interested in estimating E[g(X1,...,X,)] by simulating 


= (X1,...,Xn) and then computing Y=g(X1,..., Xn). Now, if for some 
random variable Z we can compute E[Y|Z] then, as Var(Y|Z) > 0, it follows 
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from the conditional variance formula that 
Var(E[Y|Z]) < Var(Y) 


implying, since E[E[Y|Z]]=E[Y], that E[Y|Z] is a better estimator of E[Y] 
than is Y. 

In many situations, there are a variety of Z; that can be conditioned on to 
obtain an improved estimator. Each of these estimators E[Y|Z;] will have mean 
E[Y] and smaller variance than does the raw estimator Y. We now show that for 
any choice of weights 4;,A; > 0, 3°; A; = 1, 90, A:E[Y|Z;] is also an improvement 
over Y. 


Proposition 11.8 For any Aj > 0, °°, a; = 1, 


(a) El 7; A,ELY|Z,]] = ELY], 
(b) Var (30; AiEL[Y|Z;]) < Var(Y). 


Proof. The proof of (a) is immediate. To prove (b), let N denote an integer valued 
random variable independent of all the other random variables under considera- 
tion and such that 


P{IN=}=/;, 121 
Applying the conditional variance formula twice yields 


Var(Y) > Var(ELY|N, Zn) 
> Var(E[ELY|N, Zn IZ, .--1) 
= Var > AELY|Zi] r 


1 


Example 11.16 Consider a queueing system having Poisson arrivals and suppose 
that any customer arriving when there are already N others in the system is 
lost. Suppose that we are interested in using simulation to estimate the expected 
number of lost customers by time t. The raw simulation approach would be to 
simulate the system up to time t and determine L, the number of lost customers 
for that run. A better estimate, however, can be obtained by conditioning on the 
total time in [0,7] that the system is at capacity. Indeed, if we let T denote the 
time in [0, ¢] that there are N in the system, then 


E[L|T] =aT 
where A is the Poisson arrival rate. Hence, a better estimate for E[L] than the aver- 


age value of L over all simulation runs can be obtained by multiplying the average 
value of T per simulation run by A. If the arrival process were a nonhomogeneous 
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Poisson process, then we could improve over the raw estimator L by keeping track 
of those time periods for which the system is at capacity. If we let ,..., Ic denote 
the time intervals in [0, ¢] in which there are N in the system, then 


Cc 
E[L\h,...,Ic] = By Xs) ds 
i=1°% 


where (s) is the intensity function of the nonhomogeneous Poisson arrival pro- 
cess. The use of the right side of the preceding would thus lead to a better estimate 
of E[L] than the raw estimator L. | 


Example 11.17 Suppose that we wanted to estimate the expected sum of the 
times in the system of the first 7 customers in a queueing system. That is, if W; 
is the time that the ith customer spends in the system, then we are interested in 
estimating 


6=E YW; 
i=1 


Let Y; denote the “state of the system” at the moment at which the ith cus- 
tomer arrives. It can be shownS that for a wide class of models the estimator 
>, ELW;| Yj] has (the same mean and) a smaller variance than the estimator 
yy, Wi. (It should be noted that whereas it is immediate that E[W;|Y;] has 
smaller variance than W;, because of the covariance terms involved it is not 
immediately apparent that )>;_, E[W;|Y;] has smaller variance than )~7_, W;.) 
For instance, in the model G/M/1 


E[Wi| Yi] = (Ni + 1)/u 


where Nj is the number in the system encountered by the ith arrival and 1/, is 
the mean service time; the result implies that )~/_, (Nj + 1)/y is a better estimate 
of the expected total time in the system of the first 7 customers than is the raw 
estimator )>/_, Wi. a 


Example 11.18 (Estimating the Renewal Function by Simulation) Consider a 
queueing model in which customers arrive daily in accordance with a renewal pro- 
cess having interarrival distribution F. However, suppose that at some fixed time 
T, for instance 5 P.M., no additional arrivals are permitted and those customers 
that are still in the system are serviced. At the start of the next and each succeed- 
ing day customers again begin to arrive in accordance with the renewal process. 


SS. M. Ross, “Simulating Average Delay—Variance Reduction by Conditioning,” Probability in the 
Engineering and Informational Sciences 2(3), (1988), pp. 309-312. 
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Suppose we are interested in determining the average time that a customer spends 
in the system. Upon using the theory of renewal reward processes (with a cycle 
starting every T time units), it can be shown that 


average time that a customer spends in the system 


E[sum of the times in the system of arrivals in (0, T)] 
m(T) 


where m/(T) is the expected number of renewals in (0, T). 

If we were to use simulation to estimate the preceding quantity, a run would 
consist of simulating a single day, and as part of a simulation run, we would 
observe the quantity N(T), the number of arrivals by time T. Since E[N(T)] = 
m(T), the natural simulation estimator of m(T) would be the average (over all 
simulated days) value of N(T) obtained. However, Var(N(T)) is, for large T, 
proportional to T (its asymptotic form being To*/3, where o* is the variance 
and yz the mean of the interarrival distribution F), and so, for large T, the variance 
of our estimator would be large. A considerable improvement can be obtained 
by using the analytic formula (see Section 7.3) 


n= ogg ON (11.13) 
LL a 


where Y(T) denotes the time from T until the next renewal—that is, it is the 
excess life at T. Since the variance of Y(T) does not grow with T (indeed, it 
converges to a finite value provided the moments of F are finite), it follows that 
for T large, we would do much better by using the simulation to estimate E[Y(T)] 
and then using Equation (11.13) to estimate m(T). 

However, by employing conditioning, we can improve further on our estimate 
of m(T). To do so, let A(T) denote the age of the renewal process at time T—that 
is, it is the time at T since the last renewal. Then, rather than using the value of 
Y(T), we can reduce the variance by considering E[Y(T)|A(T)]. Now, knowing 
that the age at T is equal to x is equivalent to knowing that there was a renewal 
at time T — x and the next interarrival time X is greater than x. Since the excess 
at T will equal X — x (see Figure 11.5), it follows that 


ELY(T)|A(T) = x] = E[X — x|X > x] 


_ fe PiX-x> 
=f PIX > x} 


PP Fe +4)] 
=| irae 


which can be numerically evaluated if necessary. 
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Figure 11.5 A(T) =x. 


As an illustration of the preceding note that if the renewal process is a Poisson 
process with rate 4, then the raw simulation estimator N(T) will have variance 
AT; since Y(T) will be exponential with rate 1, the estimator based on (11.13) 
will have variance 42 Var{Y(T)} = 1. On the other hand, since Y(T) will be 
independent of A(T) (and E[Y(T)|A(T)] = 1/A), it follows that the variance of 
the improved estimator E[Y(T)|A(T)] is 0. That is, conditioning on the age at 
time T yields, in this case, the exact answer. a 


Example 11.19 Consider the M/G/1 queueing system where customers arrive 
in accordance with a Poisson process with rate A to a single server having service 
distribution G with mean E[S]. Suppose that, for a specified time to, the server 
will take a break at the first time t > to at which the system is empty. That is, if 
X(t) is the number of customers in the system at time t, then the server will take 
a break at time 


T = min{t > tp: X(t) = 0} 


To efficiently use simulation to estimate E[T], generate the system to time 19; let 
R denote the remaining service time of the customer in service at time fo, and 
let Xo equal the number of customers waiting in queue at time fo. (Note that R 
is equal to 0 if X(to) = 0, and Xo = (X(to) — 1)*.) Now, with N equal to the 
number of customers that arrive in the remaining service time R, it follows that 
if N = and Xo = ng, then the additional amount of time from to + R until 
the server can take a break is equal to the amount of time that it takes until the 
system, starting with 7 + 2g customers, becomes empty. Because this is equal to 
the sum of 1 + 1g busy periods, it follows from Section 8.5.3 that 


E 
E[T|R,N, Xo] = to + R+(N+ Xo) 


Consequently, 


E[T|R, Xo] = E[EIT|R,N, XollR, Xo] 


E{S 
= to + R + (E[N|R, XQ] + Xo am 


E[S 
= to + R+OR+Xg) 
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Thus, rather than using the generated value of T as the estimator from a 

simulation run, it is better to stop the simulation at time fo and use the estimator 
E[s 

to + OR + Xg) Ey. 2 


11.6.3 Control Variates 


Again suppose we want to use simulation to estimate E[g(X)] where X= 
(Xq,...,Xn). But now suppose that for some function f the expected value of 
f (X) is known—say, E[f(X)] = w. Then for any constant a we can also use 


W = g(X) + a(f(X%) — w) 
as an estimator of E[g(X)]. Now, 

Var(W) = Var(g(X)) + a” Var(f(X)) + 2a Cov(g(X), f(X)) 
Simple calculus shows that the preceding is minimized when 


_ = Covif (X), g(X)) 
Var(f(X)) 


and, for this value of a, 


[Cov(f(X), g(X))}? 
Var(f (X)) 


Var(W) = Var(g(X)) 


Because Var(f (X)) and Cov(f (X), g(X)) are usually unknown, the simulated data 
should be used to estimate these quantities. 
Dividing the preceding equation by Var(g(X)) shows that 


Var(W) ) 
Var(e(X) > 1 — Corr*(f(X), g(X)) 


where Corr(X, Y) is the correlation between X and Y. Consequently, the use 
of a control variate will greatly reduce the variance of the simulation estimator 
whenever f(X) and g(X) are strongly correlated. 


Example 11.20 Consider a continuous-time Markov chain that, upon entering 
state i, spends an exponential time with rate v; in that state before making a tran- 
sition into some other state, with the transition being into state j with probability 
Pi;,i > 0,7 # i. Suppose that costs are incurred at rate C(i) > O per unit time 
whenever the chain is in state i,i > 0. With X(t) equal to the state at time ¢, and 
a being a constant such that 0 < a < 1, the quantity 


w=f e “'C(X(t)) dt 
0 
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represents the total discounted cost. For a given initial state, suppose we want 
to use simulation to estimate E[W]. Whereas at first it might seem that we 
cannot obtain an unbiased estimator without simulating the continuous-time 
Markov chain for an infinite amount of time (which is clearly impossible), we 
can make use of the results of Example 5.1, which gives the equivalent expression 
for E[W]: 


T 
E[W] = | [ C(X(t)) a 
0 


where T is an exponential random variable with rate a that is independent of 
the continuous-time Markov chain. Therefore, we can first generate the value 
of T, then generate the states of the continuous-time Markov chain up to time 
T, to obtain the unbiased estimator ‘e C(X(t)) dt. Because all the cost rates are 
nonnegative this estimator is strongly positively correlated with T, which will 
thus make an effective control variate. a 


Example 11.21 (A Queueing System) Let D,,,; denote the delay in queue of 
the n + 1 customer in a queueing system in which the interarrival times are 
independent and identically distributed (i.i.d.) with distribution F having mean 
jf and are independent of the service times, which are i.i.d. with distribution G 
having mean yo. If X; is the interarrival time between arrival i and i + 1, and if 
S; is the service time of customer i, 1 > 1, we may write 


Dy+it = 9(Xq,...,Xn.51,...5 Sn) 


To take into account the possibility that the simulated variables X;,S; may by 
chance be quite different from what might be expected we can let 


f (X15 +++ Xny S15..-58n) = YS; — Xi) 
i=1 


As E[f(X, $)] = n(44g — wp) we could use 


g(X, S) + alf (X, 8) — (ug — HF)] 


as an estimator of E[D,41]. Since D,41 and f are both increasing functions of 
S;,-Xj,i = 1,...,2 it follows from Theorem 11.1 that f(X,S) and D,41 are 
positively correlated, and so the simulated estimate of a should turn out to be 
negative. 

If we wanted to estimate the expected sum of the delays in queue of the first 
N(T) arrivals, then we could use ee ) §; as our control variable. Indeed as the 
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arrival process is usually assumed independent of the service times, it follows that 


N(T) 
E > S; = E[S]E[N(T)] 
i=1 


where E[N(T)] can either be computed by the method suggested in Section 7.8 or 
estimated from the simulation as in Example 11.18. This control variable could 
also be used if the arrival process were a nonhomogeneous Poisson with rate A(t); 
in this case, 


T 
E[N(T)] = / A(t) dt = 
0 


11.6.4 Importance Sampling 


Let X = (X1,..., X,) denote a vector of random variables having a joint density 
function (or joint mass function in the discrete case) f(x) = f(x1,...,Xn), and 
suppose that we are interested in estimating 


0 = E[h(X)] = [reofeo dx 


where the preceding is an n-dimensional integral. (If the X; are discrete, then 
interpret the integral as an n-fold summation.) 

Suppose that a direct simulation of the random vector X, so as to compute 
values of h(X), is inefficient, possibly because (a) it is difficult to simulate a 
random vector having density function f(x), or (b) the variance of h(X) is large, 
or (c) a combination of (a) and (b). 

Another way in which we can use simulation to estimate @ is to note that if 
g(x) is another probability density such that f(x) = 0 whenever g(x) = 0, then 
we can express 6 as 


a i. BEOPOO ss de 
g(x) 


(11.14) 


E eee 
SL g(X) 


where we have written Eg to emphasize that the random vector X has joint 
density g(x). 

It follows from Equation (11.14) that @ can be estimated by successively gen- 
erating values of a random vector X having density function g(x) and then using 
as the estimator the average of the values of h(X)f (X)/g(X). If a density function 
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g(x) can be chosen so that the random variable h(X)f(X)/g(X) has a small 
variance then this approach—referred to as importance sampling—can result 
in an efficient estimator of 6. 

Let us now try to obtain a feel for why importance sampling can be useful. To 
begin, note that f(X) and g(X) represent the respective likelihoods of obtaining 
the vector X when X is a random vector with respective densities f and g. Hence, 
if X is distributed according to g, then it will usually be the case that f(X) will 
be small in relation to g(X) and thus when X is simulated according to g the 
likelihood ratio f (X)/g(X) will usually be small in comparison to 1. However, it 
is easy to check that its mean is 1: 


f(X%]_ f foo 
[Fo 1= 1 geo ‘ sode= f foodx=1 


Thus we see that even though f(X)/g(X) is usually smaller than 1, its mean is 
equal to 1; thus implying that it is occasionally large and so will tend to have a 
large variance. So how can h(X)f (X)/g(X) have a small variance? The answer is 
that we can sometimes arrange to choose a density g such that those values of x 
for which f(x)/g(x) is large are precisely the values for which h(x) is exceedingly 
small, and thus the ratio h(X)f (X)/g(X) is always small. Since this will require 
that h(x) sometimes be small, importance sampling seems to work best when 
estimating a small probability; for in this case the function A(x) is equal to 1 
when x lies in some set and is equal to 0 otherwise. 

We will now consider how to select an appropriate ae g. We will find that 
the so-called tilted densities are useful. Let M(t) = E,le* = f ef (x) dx be the 
moment generating function corresponding to a one- eernes density f. 


Definition 11.2 A density function 


ef (x) 
M(t) 


f(x) = 
is called a tilted density of f, -oo < t < co. 


A random variable with density f; tends to be larger than one with density f when 
t > 0 and tends to be smaller when t < 0. 
In certain cases the tilted distributions f; have the same parametric form as 


does f. 


Example 11.22 If f is the exponential density with rate 4 then 
f(x) = Ce re7* = 1Ce 8 -* 


where C = 1/M(t) does not depend on x. Therefore, for t < A, f¢ is an exponential 
density with rate A — ft. 
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If f is a Bernoulli probability mass function with parameter p, then 


f@)=pA—-p)*, x«=0,1 


Hence, M(t) = Eyle*] = pe'+1-—pandso 


suis t\x(4 _ py1-x 
hO= ap he) =P) 


(ais) ets)” 
~~ \pet+1—-p/) \pt+1—p (11.15) 


That is, f¢ is the probability mass function of a Bernoulli random variable with 
parameter 


pe 
Sore paar 
pe’+1-—p 
We leave it as an exercise to show that if f is a normal density with parameters 
wand o7 then f; is a normal density with mean pp + ot and variance o?. a 


In certain situations the quantity of interest is the sum of the independent 
random variables X1,...,X,. In this case the joint density f is the product of 
one-dimensional densities. That is, 


f ety cing hn) = fi(x1)-: -fn(Xn) 


where f; is the density function of Xj. In this situation it is often useful to 
generate the X; according to their tilted densities, with a common choice of 
t employed. 


Example 11.23 Let X1,..., X, be independent random variables having respec- 
tive probability density (or mass) functions fj, for i= 1,...,”. Suppose we 
are interested in approximating the probability that their sum is at least as 
large as a, where a is much larger than the mean of the sum. That is, we are 
interested in 


0 


Il 
ty 
DH 
WV 
& 


where S = 7, Xj, and where a > )“_, E[Xj]. Letting I{S > a} equal 1 if S > a 
and letting it be 0 otherwise, we have that 


6 = E¢(I{S 2 a}] 
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where f = (f1,..., fn). Suppose now that we simulate X; according to the tilted 
mass function fiz, i = 1,...,7, with the value of t,t > 0 left to be determined. 
The importance sampling estimator of 8 would then be 


fit (Xi) 
Now, 
fi(Xi) —tX; 
— M; t 
7G ma 
and so 


6=I{S > a}M(t)e 


where M(t) = [] Mj(£) is the moment generating function of S. Since t > 0 and 
I{S > a} is equal to 0 when S < a, it follows that 


HS Sale" Ze? 
and so 
6< M(te™ 


To make the bound on the estimator as small as possible we thus choose t, t > 0, 
to minimize M(t)e~™. In doing so, we will obtain an estimator whose value on 
each iteration is between 0 and min;M(t)e~™. It can be shown that the minimizing 
t, call it t*, is such that 


Ep[S] = Ep bp x =a 
i=1 


where, in the preceding, we mean that the expected value is to be taken under 
the assumption that the distribution of X; is fj,» fori = 1,...,n. 

For instance, suppose that X1,..., X, are independent Bernoulli random vari- 
ables having respective parameters p;, for i = 1,...,7. Then, if we generate the X; 
according to their tilted mass functions pj7,i = 1,...,”, the importance sampling 
estimator of 0 = P{S > a} is 


6=1(8 > aje*] [(pie' + 1 —pi) 
i=1 
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Since pj, is the mass function of a Bernoulli random variable with parameter 
pie’ /(piet + 1 — pj) it follows that 


BAS aye ae 
= pie’ + 1— pi 


i=1 


The value of ¢ that makes the preceding equal to a can be numerically approxi- 
mated and then utilized in the simulation. 
As an illustration, suppose that 7 = 20, p; = 0.4,and a = 16. Then 


0.4e? 
0.4e4 + 0.6 


E,[S] = 20 
Setting this equal to 16 yields, after a little algebra, 
ef =6 


Thus, if we generate the Bernoullis using the parameter 


0.4e” 


ete 


then because 

M(t*) = (0.4e" +0.6)°° and eS =(1/6)5 
we see that the importance sampling estimator is 

6 = I{S > 16}(1/6)53° 
It follows from the preceding that 

6 < (1/6) 137° = 81/2!° = 0.001236 
That is, on each iteration the value of the estimator is between 0 and 0.001236. 
Since, in this case, 6 is the probability that a binomial random variable with 
parameters 20, 0.4 is at least 16, it can be explicitly computed with the result 
6 = 0.000317. Hence, the raw simulation estimator I, which on each iteration 
takes the value 0 if the sum of the Bernoullis with parameter 0.4 is less than 16 


and takes the value 1 otherwise, will have variance 


Var(I) = 0(1 — 6) = 3.169 x 1074 
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On the other hand, it follows from the fact that 0 < 6 < 0.001236 that (see 
Exercise 33) 


Var(6) < 2.9131 x 107” r 


Example 11.24 Consider a single-server queue in which the times between suc- 
cessive customer arrivals have density function f and the service times have den- 
sity g. Let D,, denote the amount of time that the mth arrival spends waiting 
in queue and suppose we are interested in estimating w= P{D, > a} when a is 
much larger than E[D,]. Rather than generating the successive interarrival and 
service times according to f and g, respectively, they should be generated accord- 
ing to the densities f_; and g;, where ¢ is a positive number to be determined. 
Note that using these distributions as opposed to f and g will result in smaller 
interarrival times (since —t < 0) and larger service times. Hence, there will be a 
greater chance that D,, > a than if we had simulated using the densities f and g. 
The importance sampling estimator of a would then be 


& = I{D, > aje’S—* [My (—t)Mg(t)!" 


where S, is the sum of the first 7 interarrival times, Y;, is the sum of the first 
service times, and My and Mg are the moment generating functions of the densities 
f and g, respectively. The value of t used should be determined by experimenting 
with a variety of different choices. | 


11.7 Determining the Number of Runs 


Suppose that we are going to use simulation to generate r independent and iden- 
tically distributed random variables Y",..., Y having mean yw and variance 
o*. We are then going to use 


YOu... 4 YO 
r 


r= 


as an estimate of jz. The precision of this estimate can be measured by its variance 
Var(¥,) = EU, — 1)"] 
=o /r 


Hence, we would want to choose r, the number of necessary runs, large enough 
so that o7/r is acceptably small. However, the difficulty is that o7 is not known in 
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advance. To get around this, you should initially simulate k runs (where k > 30) 
and then use the simulated values Y“),..., Y® to estimate o? by the sample 
variance 


k 


Dv? = Fi) /& = 


i=1 


Based on this estimate of o* the value of r that attains the desired level of precision 
can now be determined and an additional r — k runs can be generated. 


11.8 Generating from the Stationary Distribution 
of a Markov Chain 


11.8.1 Coupling from the Past 


Consider an irreducible Markov chain with states 1,...,7 and transition prob- 
abilities P;; and suppose we want to generate the value of a random variable 
whose distribution is that of the stationary distribution of this Markov chain. 
Whereas we could approximately generate such a random variable by arbitrarily 
choosing an initial state, simulating the resulting Markov chain for a large fixed 
number of time periods, and then choosing the final state as the value of the ran- 
dom variable, we will now present a procedure that generates a random variable 
whose distribution is exactly that of the stationary distribution. 

If, in theory, we generated the Markov chain starting at time —oo in any arbi- 
trary state, then the state at time 0 would have the stationary distribution. So 
imagine that we do this, and suppose that a different person is to generate the 
next state at each of these times. Thus, if X(—7), the state at time —x, is i, then 
person —1 would generate a random variable that is equal to j with probability 
Pi;,7 = 1,...,m, and the value generated would be the state at time —( — 1). 
Now suppose that person —1 wants to do his random variable generation early. 
Because he does not know what the state at time —1 will be, he generates a 
sequence of random variables N_;(i),i=1,...,m, where N_;(i), the next state 
if X(—1) = i, is equal to j with probability P;;, 7 = 1,...,m. If it results that 
X(—1) =i, then person —1 would report that the state at time 0 is 


S1i@=N1@, i=1,...,m 


(That is, S_;(z) is the simulated state at time 0 when the simulated state at time 
—1 isi.) 

Now suppose that person —2, hearing that person —1 is doing his simulation 
early, decides to do the same thing. She generates a sequence of random variables 
N_3(i),i = 1,...,, where N_ (i) is equal to j with probability Pj;,j = 1,...,m. 
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Consequently, if it is reported to her that X(—2) = i, then she will report that 
X(—1) = N_2(i). Combining this with the early generation of person —1 shows 
that if X(—2) = i, then the simulated state at time 0 is 


S_2(4) = S_1(N_2@),  i=1,...,m 


Continuing in the preceding manner, suppose that person —3 generates a 
sequence of random variables N_3(/), i = 1,...,, where N_3(i) is to be the 
generated value of the next state when X(—3) = i. Consequently, if X(—3) =i 
then the simulated state at time 0 would be 


S_3(1) = S_2(N-3(4)), i= 1,...,m 
Now suppose we continue the preceding, and so obtain the simulated functions 
S10) S 9 O83) 55 <3 i= Lapa ott 


Going backward in time in this manner, we will at some time, say —r, have a 
simulated function S_,(2) that is a constant function. That is, for some state /, 
S_,(i) will equal j for all states i= 1,...,7. But this means that no matter what 
the simulated values from time —oo to —r, we can be certain that the simulated 
value at time 0 is 7. Consequently, j can be taken as the value of a generated 
random variable whose distribution is exactly that of the stationary distribution 
of the Markov chain. 


Example 11.25 Consider a Markov chain with states 1, 2, 3 and suppose that 
simulation yielded the values 


3 ate St 
NAG or. Shes 
Dik a9 
and 
i aed 
N10) = 143, if i=2 
ie. Ghess 
Then 


3, ifi=l 
So@=42, if i=2 
3, ifi=3 
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If 
a. pst 
NaGQest. Apso 
eit P83 
then 
3. a= 
Sxia43. Ale? 
3, “iy =3 


Therefore, no matter what the state is at time —3, the state at time 0 will 
be 3. | 


Remark The procedure developed in this section for generating a random vari- 
able whose distribution is the stationary distribution of the Markov chain is called 
coupling from the past. 


11.8.2 Another Approach 


Consider a Markov chain whose state space is the nonnegative integers. Suppose 
the chain has stationary probabilities, and denote them by z;,i > 0. We now 
present another way of simulating a random variable whose distribution is given 
by the z;,i > 0, which can be utilized if the chain satisfies the following prop- 
erty. Namely, that for some state, which we will call state 0, and some positive 
number a 


Pio Fa>O 


for all states i. That is, whatever the current state, the probability that the next 
state will be 0 is at least some positive value a. 

To simulate a random variable distributed according to the stationary prob- 
abilities, start by simulating the Markov chain in the obvious manner. Namely, 
whenever the chain is in state i, generate a random variable that is equal to j with 
probability P;;,j > 0, and then set the next state equal to the generated value 
of this random variable. In addition, however, whenever a transition into state 0 
occurs a coin, whose probability of coming up heads depends on the state from 
which the transition occurred, is flipped. Specifically, if the transition into state 0 
was from state 7, then the coin flipped has probability w/Pj,9 of coming up heads. 
Call such a coin an i-coin, i > 0. If the coin comes up heads then we say that an 
event has occurred. Consequently, each transition of the Markov chain results in 
an event with probability a, implying that events occur at rate a. Now say that 
an event is an i-event if it resulted from a transition out of state 7; that is, an event 
is an i-event if it resulted from the flip of an i-coin. Because z; is the proportion 
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of transitions that are out of state 7, and each such transition will result in an 
i-event with probability a, it follows that the rate at which i-events occur is aj. 
Therefore, the proportion of all events that are i-events is a7;/a = 7;,i > 0. 

Now, suppose that Xo = 0. Fix i, and let J; equal 1 if the j*” event that occurs is 
an i-event, and let J; equal 0 otherwise. Because an event always leaves the chain 
in state 0 it follows that I;,j > 1, are independent and identically distributed 
random variables. Because the proportion of the J; that are equal to 1 is 7;, we 
see that 


— te t+Iy 
x; = lim ———— 
n—> Oo n 
= EW] 
=P =) 


where the second equality follows from the strong law of large numbers. Hence, 
if we let 


T = min{m > 0: an event occurs at time 1} 
denote the time of the first event, then it follows from the preceding that 
m= Py =1) = P(X7-1 = 9) 


As the preceding is true for all states i, it follows that X7_1, the state of the 
Markov chain at time T — 1, has the stationary distribution. 


Exercises 


*1. Suppose it is relatively easy to simulate from the distributions F;, i= 1,2,...,7. If 
nis small, how can we simulate from 


n 
Fix) = TPF), Pi 20, DIP= 1? 
i=1 i 


Give a method for simulating from 


1-e7* +2 

eS, O<x<1 
ss —2x 

= 5 1<x<@w 


Give a method for simulating a negative binomial random variable. 


*3. Give a method for simulating a hypergeometric random variable. 
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4. 


*6, 


Suppose we want to simulate a point located at random in a circle of radius r 
centered at the origin. That is, we want to simulate X, Y having joint density 


1 
f(x,y) = =a) erry <r 


(a) Let R = VX2 + Y2, 6 = tan! Y/X denote the polar coordinates. Compute 
the joint density of R, 6 and use this to give a simulation method. Another 
method for simulating X, Y is as follows: 

Step 1: Generate independent random numbers U;,U2 and set Z,= 
2rU — r, Zz = 2rU2—1r. Then Z1, Z> is uniform in the square whose 
sides are of length 2r and which encloses, the circle of radius r (see 
Figure 11.6). 

Step 2: If (Z 1, Zz) lies in the circle of radius ry—that is, if Zz + Ze, < r*—set 
(X, Y) = (Z1, Zz). Otherwise return to step 1. 

(b) Prove that this method works, and compute the distribution of the number of 
random numbers it requires. 


Suppose it is relatively easy to simulate from F; for each i = 1,...,”. How can we 
simulate from 

(a) Fx) =]]L, @)? 

(b) Fx) =1-[]j-,0 — F(@))? 

(c) Give two methods for simulating from the distribution F(x) = x”, 0 <x <1. 
In Example 11.4 we simulated the absolute value of a standard normal by using the 
Von Neumann rejection procedure on exponential random variables with rate 1. 
This raises the question of whether we could obtain a more efficient algorithm 
by using a different exponential density—that is, we could use the density g(x) = 
de—**, Show that the mean number of iterations needed in the rejection scheme is 
minimized when 4 = 1. 


7. Give an algorithm for simulating a random variable having density function 


f(x) = 30(x? — 2x7 +x4), O<x<1 


r,0 e r,0 


Figure 11.6 


728 Simulation 
8. Consider the technique of simulating a gamma (”, A) random variable by using the 
rejection method with g being an exponential density with rate A/n. 
(a) Show that the average number of iterations of the algorithm needed to generate 
a gamma is ne!" I(n —1)!. 
(b) Use Stirling’s approximation to show that for large n the answer to part (a) is 
approximately equal to e[(m — 1)/(2m)]'/2. 
(c) Show that the procedure is equivalent to the following: 
Step 1: Generate Y; and Y2, independent exponentials with rate 1. 
Step 2: If Y, < (n— 1)[Y2 — log(Y2) — 1], return to step 1. 
Step 3: Set X = nY2/n. 
(d) Explain how to obtain an independent exponential along with a gamma from 
the preceding algorithm. 
9. Setup the alias method for simulating from a binomial random variable with param- 
eters 1 = 6, p = 0.4. 
10. Explain how we can number the Q® in the alias method so that k is one of the 
two points that Q®) gives weight. 
Hint: Rather than giving the initial Q the name Q™, what else could we call it? 
11. Complete the details of Example 11.10. 
12. Let X1,...,X, be independent with 
negre ee : 
P{X, =j}= ee i Lj nye PST, a 3k 
If D is the number of distinct values among Xj,...,X, show that 
k 
-1 
novfi-(*3) 
n 
b2 2 
~k—— when E is small 
2n n 
13. The Discrete Rejection Method: Suppose we want to simulate X having probability 
mass function P{X = i} = Pj,i = 1,...,2 and suppose we can easily simulate 
from the probability mass function Q;,5°;O; = 1, Q; > 0. Let C be such that 
P; <CQj,i = 1,...,”. Show that the following algorithm generates the desired 
random variable: 
Step 1: Generate Y having mass function O and U an independent random 
number. 
Step 2: If U < Py/CQy, set X = Y. Otherwise return to step 1. 
14. The Discrete Hazard Rate Method: Let X denote a nonnegative integer valued 


random variable. The function A(n) = P{X = n| X > n}, n > 0, is called the 

discrete hazard rate function. 

(a) Show that P{X = n} = A(n) []29 A — A@). 

(b) Show that we can simulate X by generating random numbers Uj, U2,... stop- 
ping at 


X = min{n: U, < A(n)} 
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1S. 


*16. 


17. 


(c) Apply this method to simulating a geometric random variable. Explain, intu- 
itively, why it works. 

(d) Suppose that (7) < p <1 for all 7. Consider the following algorithm for sim- 
ulating X and explain why it works: Simulate X;, Uj, i > 1 where X; is geo- 
metric with mean 1/p and U; is a random number. Set Sp, = X1 +--- + Xz 
and let 


X = min{S,: Up < A(Sz)/p} 


Suppose you have just simulated a normal random variable X with mean pw and 
variance o”. Give an easy way to generate a second normal variable with the same 
mean and variance that is negatively correlated with X. 


Suppose 7 balls having weights w1,w2,...,W, are in an urn. These balls are sequen- 

tially removed in the following manner: At each selection, a given ball in the urn is 

chosen with a probability equal to its weight divided by the sum of the weights of 

the other balls that are still in the urn. Let 11, Iy,...,1,, denote the order in which 

the balls are removed—thus [j,..., I, is a random permutation with weights. 

(a) Give a method for simulating y,..., In. 

(b) Let X; be independent exponentials with rates w;,i = 1,...,7. Explain how 
X; can be utilized to simulate ,...,I,. 


Order Statistics: Let X1,..., Xn be i.i.d. from a continuous distribution F, and let 
Xj) denote the ith smallest of X1,...,Xn,i = 1,...,7. Suppose we want to simulate 
X 1) < X(2) < +++ < X(q). One approach is to simulate 7 values from F, and then 
order these values. However, this ordering, or sorting, can be time consuming when 
n is large. 

(a) Suppose that A(t), the hazard rate function of F, is bounded. Show how the 
hazard rate method can be applied to generate the variables in such a manner 
that no sorting is necessary. 

Suppose now that F~! is easily computed. 

(b) Argue that X(q),...,X(n) can be generated by simulating Uq) < Ui) <--- < 
U(n)—the ordered values of n independent random numbers—and then setting 
Xi = F-1(U@). Explain why this means that X(j) can be generated from 
F-1(6;) where 8; is beta with parameters i,n + i+ 1. 

(c) Argue that Uq),..., Ui) can be generated, without any need for sorting, by 
simulating i.i.d. exponentials Y1,..., Y,41 and then setting 


Yt +¥ 
= oS et 
Yp tees + Yuu 


Hint: Given the time of the (7 + 1)st event of a Poisson process, what can be said 
about the set of times of the first 1 events? 
(d) Show that if Uj) = y then Uq),..., Ugn—1) has the same joint distribution as 
the order statistics of a set of m — 1 uniform (0, y) random variables. 
(e) Use part (d) to show that U(1),..., U(x) can be generated as follows: 
Step 1: Generate random numbers Uj,..., Un. 
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18. 


19. 


20. 


21. 


Step 2: Set 


Um = ae Utn—1) = Ug (U2) 4-9, 
Ug-1) = Ug (Un-j42) 9-9, j = 2, seg 1 


Let X1,...,X» be independent exponential random variables each having rate 1. 
Set 


W, = X1/n, 
Xj 


W; = Wi-1 + rer ars 


P= yay ht 
Explain why W1,..., W,, has the same joint distribution as the order statistics of a 
sample of 7 exponentials each having rate 1. 


Suppose we want to simulate a large number 7 of independent exponentials with 
rate 1—call them X 1, X2,..., Xn. If we were to employ the inverse transform tech- 
nique we would require one logarithmic computation for each exponential gener- 
ated. One way to avoid this is to first simulate S,, a gamma random variable with 
parameters (7,1) (say, by the method of Section 11.3.3). Now interpret S,, as the 
time of the nth event of a Poisson process with rate 1 and use the result that given 
Sy the set of the first 7 — 1 event times is distributed as the set of m — 1 indepen- 
dent uniform (0,S,) random variables. Based on this, explain why the following 
algorithm simulates 1 independent exponentials: 


Step 1: Generate S,, a gamma random variable with parameters (n, 1). 

Step 2: Generate n — 1 random numbers Uj, U2,..., Uy_1. 

Step 3: Order the Uj,i=1,...,2— 1 to obtain Ug) < Ug) <--- < Ug-1). 
Step 4: Let Uo) = 0, Ug = 1, and set X; = S,(Ui) — Ug-1)),i=1,...,2. 


When the ordering (step 3) is performed according to the algorithm described in 
Section 11.5, the preceding is an efficient method for simulating 1 exponentials 
when all 7 are simultaneously required. If memory space is limited, however, and 
the exponentials can be employed sequentially, discarding each exponential from 
memory once it has been used, then the preceding may not be appropriate. 


Consider the following procedure for randomly choosing a subset of size k from 
the numbers 1,2,...,7: Fix p and generate the first 7 time units of a renewal 
process whose interarrival distribution is geometric with mean 1/p—that is, 
P{interarrival time =k} = p(1 — pet, k=1,2,.... Suppose events occur at times 
iy < 12 < +++ <im <n. If m=k, stop; i1,...,im is the desired set. If m > k, 
then randomly choose (by some method) a subset of size k from i1,..., im and then 
stop. If m < k, take i1,...,i as part of the subset of size k and then select (by 
some method) a random subset of size k—m from the set {1,2,...,2} —{i1,..., im}. 
Explain why this algorithm works. As E[N()] = np a reasonable choice of p is to 
take p ~ k/n. (This approach is due to Dieter.) 

Consider the following algorithm for generating a random permutation of the 
elements 1,2,...,7. In this algorithm, P(i) can be interpreted as the element in 
position 7. 
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22. 


229. 


24. 


25. 


Step 1: Setk=1. 
Step 2: Set P(1) = 1. 
Step 3: Ifk =n, stop. Otherwise, let k =k + 1. 


Step 4: Generate a random number U, and let 


P(k) = P((RU] + 1), 
PRU] +1) =k. 
Go to step 3. 
(a) Explain in words what the algorithm is doing. 


(b) Show that at iteration kR—that is, when the value of P(k) is initially set—that 
P(A), P(2),..., P(R) is a random permutation of 1,2,...,k. 


Hint: Use induction and argue that 
Pp {its iz, . «+5415 Ry djs - + +5 ip_a5t} 
= Pp_1{01,12, tee sH-151, ij ee tka} F 


1 
= by the induction hypothesis 


The preceding algorithm can be used even if 7 is not initially known. 


Verify that if we use the hazard rate approach to simulate the event times of a non- 
homogeneous Poisson process whose intensity function A(f) is such that A(t) < A, 
then we end up with the approach given in method 1 of Section 11.5. 


For a nonhomogeneous Poisson process with intensity function A(t), t > 0, where 
he X(t) dt = ov, let X14, X2,... denote the sequence of times at which events 
occur. 

(a) Show that ie A(t) dt is exponential with rate 1. 


(b) Show that faa A(t) dt,i > 1, are independent exponentials with rate 1, where 
Xo = 0. 

In words, independent of the past, the additional amount of hazard that must be 

experienced until an event occurs is exponential with rate 1. 

Give an efficient method for simulating a nonhomogeneous Poisson process with 

intensity function 


1 
A(t) = b+ —, t>0 
tta 


Let (X, Y) be uniformly distributed in a circle of radius r about the origin. That is, 
their joint density is given by 


1 
f(y) = 5: 0<xr+y¥<r 


Let R = VX? + Y? and 6 = arctan Y/X denote their polar coordinates. Show that 
R and @ are independent with @ being uniform on (0,2) and P{R < a} = a?/r’, 
O<a<r. 
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26. Let R denote a region in the two-dimensional plane. Show that for a two- 
dimensional Poisson process, given that there are 1 points located in R, the 
points are independently and uniformly distributed in R—that is, their density is 
f(x,y) = c, (x,y) € R where c is the inverse of the area of R. 


27. Let X1,...,Xn be independent random variables with E[X;] = 0, Var(X;) = o? 
i= 1,...,m, and consider estimates of 6 of the form )“_, A;X; where )77_1 Aj = 1. 


Show that Var (}77_; 4; Xj) is minimized when 


= ao / (So 1/02), = ee) 
j=1 


Possible Hint: If you cannot do this for general x, try it first when n = 2. 


The following two problems are concerned with the estimation of fo g(x) dx = E[g(U)] 
where U is uniform (0,1). 
28. The Hit-Miss Method: Suppose g is bounded in [0,1]—for instance, suppose 
0 < g(x) <b for x € [0,1]. Let U;, Uz be independent random numbers and set 
X = U1, Y = bU2—s0 the point (X, Y) is uniformly distributed in a rectangle of 
length 1 and height b. Now set 
I= le if Y < g(X) 
~ 10, otherwise 
That is, accept (X, Y) if it falls in the shaded area of Figure 11.7. 
(a) Show that E[bI] = (a g(x) dx. 
(b) Show that Var(bI) > Var(g(U)), and so hit-miss has larger variance than 
simply computing g of a random number. 


29. Stratified Sampling: Let U1,...,Un be independent random numbers and set 


U; = (U; +i -1)/n, i=1,...,2. Hence, U;,i> 1, is uniform on ((i — 1)/n,i/n). 
an g(U;)/n is called the stratified sampling estimator of ih g(x) dx. 

(a) Show that E[“_, g(U;)/n] = tg g(x) dx. 

(b) Show that Var[)7%_, g(Uj)/m] < Var[S~“_, g(U;)/nl. 


0,b 1,b 
G(x) 
0,0 1,0 


Figure 11.7 
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30. 


31. 


#32, 


33. 


34. 


35. 


36. 


Hint: Let U be uniform (0,1) and define N by N =i if @ — 1)/n < U < i/n, 
i=1,...,n. Now use the conditional variance formula to obtain 


Var(g(U)) = E[Var(g(U)|N)] + Var(Elg(U)|N]) 
> E[Var(g(U)|N)] 
= x Var(g(U)|N = i) _ oi Varlg(U;)] 


: n . n 
i=1 i=1 


If f is the density function of a normal random variable with mean pw and variance 
o, show that the tilted density f; is the density of a normal random variable with 
mean 2 + o2t and variance 07. 


Consider a queueing system in which each service time, independent of the past, 
has mean p. Let W, and D,, denote, respectively, the amounts of time customer 7 
spends in the system and in queue. Hence, Dy = Wy — Sy where Sj is the service 
time of customer 7. Therefore, 


E(Dy] = E[Wn] — uw 


If we use simulation to estimate E[D,], should we 

(a) use the simulated data to determine D,,, which is then used as an estimate of 
E[D,,]; or 

(b) use the simulated data to determine W,, and then use this quantity minus jz as 
an estimate of E[D,.]? 

Repeat for when we want to estimate E[W,]. 


Show that if X and Y have the same distribution then 
Var((X + Y)/2) < Var(X) 


Hence, conclude that the use of antithetic variables can never increase variance 
(though it need not be as efficient as generating an independent set of random 
numbers). 


If 0 < X <a, show that 

(a) E[X*] < aE[X], 

(b) Var(X) < E[X](@— E[X)), 

(c) Var(X) < a*/4. 

Suppose in Example 11.19 that no new customers are allowed in the system after 
time fg. Give an efficient simulation estimator of the expected additional time after 
to until the system becomes empty. 

Suppose we are able to simulate independent random variables X and Y. If we 
simulate 2k independent random variables X;,...,X, and Y;,..., Yz, where the 
X; have the same distribution as does X, and the Y; have the same distribution as 
does Y, how would you use them to estimate P(X < Y)? 

If Uy, Uz, U3 are independent uniform (0,1) random variables, find 
P(T]}21 U; > 0.1). 


Hint: Relate the desired probability to one about a Poisson process. 
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Solutions to Starred 
Exercises 


Chapter 1 


2. S={(r,g), (7, 5), (g,1), (g, 5), (br), (B, g)} where, for instance, (r,g) means that the 


first marble drawn was red and the second one green. The probability of each one 
of these outcomes is é 


5. :. If he wins, he only wins $1; if he loses, he loses $3. 


F = EU FE‘, implying since E and FE are disjoint that P(F) = P(E) + P(FE*). 
17. P{end} = 1 — P{continue} 
= 1 — [Prob(H, H, H) + Prob(T, T, T)] 


Ai. ode A 111 
Fair coin: P{end} = 1 | 


Fea ey mal ea) 


2 
4 


111 3 3 3 
Bi in: P a4 Ss, ay 
iased coin: P{end} E Ad ae 4.4 7 


16 
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19. E = event at least 1 six 


number of waystogetE — 11 


P(E) = = 
©) number of sample points 36 


D = event two faces are different 


P(D) = 1 — P(two faces the same) = 1 — — = 


36.6 
P(ED) _ 10/36 1 


PONDS 3) 5/6 3 


25. (a) P{pair} = P{second card is same denomination as first} 
_ 2 
Si 


P{pair, different suits} 


b ir | di its} = 
(b) P{pair| different suits} Pidifictcat sais) 


a P{pair} 
~ P{different suits} 
— 3/51 aE 
~ 39/51 13 
27; P(E,) =1 
P(E2|E1) = is 
2|E1) = SI 
since 12 cards are in the ace of spades pile and 39 are not. 
26 
P(E3\/E,E2) = — 
(E3|E £2) 30 


since 24 cards are in the piles of the two aces and 26 are in the other two piles. 


14243 A9 
So 


9 2 1 
P{each pile has an ace} = (=) (3) (3) 


P{George, not Bill} 
P{exactly 1} 


= P{G, not B} 
~ P{G, not B} + P{B, not G} 
7 (0.4)(0.3) 
~ (0.4)(0.3) + (0.7) (0.6) 
2 


9 


30. (a) P{Georgelexactly 1 hit} = 
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(b) P{G, hit} 
P{hit} 
AIG 0.4 20 


~ Pthit} 1—(0.3)(0.6) 41 


P{Gl|hit} = 


32. Let Ej = event person i selects own hat. 
P (no one selects hat ) 
=1-P(F, VE, U---UE,) 


=1- [Pe - 


<n 


=1-)°PEj,)- > PE, En)- > PE, E,Ei,) ++ 


y<12 iy <12 <13 


+ (-1)"P(E,E2 --- En) 


> P(E En) +++ + (-1)"* PEED - En] 


Let k € {1,2,...,}. P(E;, E;,E;,) = number of ways k specific men can select own 
hats + total number of ways hats can be arranged = (1 — k)!/n!. Number of terms in 
summation )j, <j, <...<j, = number of ways to choose k variables out of 1 variables = 


(7) = nl/ki(n — k)!. Thus, 


S> PE, En Ei = Yo (n—k)! 


n\ 
1 <++<ip 1 <:+<ip 
_ (n (n —k)! aA 
“Lk ni OR 
ak | hat) = 1 : : : 
.. P(no one selects own hat) = 1 — Tt Tay] aes 
1 1 ni 
Hp as ee) nl 
40. (a) F = event fair coin flipped; U = event two-headed coin flipped. 
P(E|H) = P(A|F)P(F) 
~ P(A\F)P(F) + P(H|U)P(U) 
e4 1 
PET 5 ee ae 
1 1 1 3 
xg titty a 78 
P(HH|F)P(F 
(b) P(EIHH) = (HH|F)P(F) 


P(AH|F)P(F) + P(HH|U)P(U) 


1 


AIR 


738 Solutions to Starred Exercises 


P(HHT|F)P(F) 
P(HHT|F)P(F) + P(HHT|U)P(U) 
P(HHT|F)P(F) 
~ P(HHT|P/)P(A +0 


(c) P(F|HHT) = 


since the fair coin is the only one that can show tails. 
45. Let B; = event ith ball is black; R; = event ith ball is red. 


P(R2|By)P(B1) 
P(R2|By)P(By) + P(R2|R1)P(R1) 
r b 


= btrte b+r 
r b rte r 


barte bar barte b4r 
rb 
e b 
— b+rte 
48. Let C be the event that the randomly chosen family owns a car, and let H be the 
event that the randomly chosen family owns a house. 


P(By|R2) = 


P(CH*) = P(C) — P(CH) = 0.6 — 0.2 = 0.4 
and 

P(C°H) = P(H) — P(CH) = 0.3—-0.2=0.1 
giving the result 


P(CH*) + P(C°H) = 0.5 


Chapter 2 

4. (a) 1,2,3,4,5,6. 
(b) 1,2,3,4,5,6. 
(e)) 253 yc025 11312; 
(d) —5,4,...,4,5. 


4\ (1\2 (1)\% 3 
= (;) (;) (5) — 3" 
16. 1—(0.95)°2 — 52(0.95)°!(0.05). 


23. In order for X to equal x, the first 2 — 1 flips must have r — 1 heads, and then the 
nth flip must land heads. By independence the desired probability is thus 


ed eral n—-r 
ee pd —py xp 
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27. P{same number of heads} = + P{A =i, B=i} 


L 


2G) OG 
OCG): 
= ( er ey 


Another argument is as follows: 


P{# heads of A = # heads of B} 
= P{# tails of A = # heads of B} since coin is fair 
= P{k — # heads of A = # heads of B} 
= P{k = total # heads} 


CO 
38. c=2, Px > 2)= f 2e~2* dx = e~4 
2 


47. Let X; be 1 if trial i is a success and 0 otherwise. 
(a) The largest value is 0.6. If X1 = X2 = X3, then 


1.8 = E[X] = 3E[X\] = 3P{X; = 1} 


and so P{X = 3} = P{X, = 1} = 0.6. That this is the largest value is seen by 
Markov’s inequality, which yields 


P{X > 3} < E[X]/3 = 0.6 


(b) The smallest value is 0. To construct a probability scenario for which P{X = 
3} = 0, let U be a uniform random variable on (0,1), and define 


1, if U < 0.6 
X,= : 
0, otherwise 
1 i > 0. 
ee ke if U > 0 4 
0, otherwise 
Kae 1, if either U < 0.3 or U > 0.7 
a 0, otherwise 


It is easy to see that 


Pix SG = S10 
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48. 


49. 


64. 


70. 


72. 


If X is a nonnegative random variable, and g is a differentiable function with 
g(0) = 0, then 


Elg(X)] = [ P(X > nel (dt 


Let f be the probability density function of X. One way to prove the result is to 
integrate by parts (dv = g'(t)dt,u = P(X > t)) to obtain 


i} P(X > tg’ dt = -fOg@I|~P + i g(t)f @)dt = Elg(X)] 


Another way is to let I(t) be the indicator function for the event that X > t. Then, 


xX ee) 
s00= [ godt = [ lit)g’ (t)dt 


Now take expectations of both sides to obtain the result. 


ELX?]— (E[X])* = Var(X) = E[(X — E[X])*] > 0. There is equality when Var(X) = 
0, that is, when X is constant. 


See Section 5.2.3 of Chapter 5. Another way is to use moment generating functions. 
The moment generating function of the sum of 7 independent exponentials with 
rate 4 is equal to the product of their moment generating functions. That is, it is 
[A/(A — t)]”. But this is precisely the moment generating function of a gamma with 
parameters 7 and i. 


Let X; be Poisson with mean 1. Then 
n n nk 
{yx <| =e" ET 
1 k=0 


But for 7 large )°7 X; — 7 has approximately a normal distribution with mean 0, and 
so the result follows. 


For the matching problem, letting X = X; + --- + Xj, where 


x 1, if ith man selects his own hat 
: 0, otherwise 


we obtain 
N 
Var(X) = )° Var(X;) + 2 ¥> ¥ > Cov(X;, Xj) 
i=1 i<j 


Since P{X; = 1} = 1/N, we see 


1 1 N-1 
Var(Xj) = N (1 ) = 
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Also, 
Cov(Xj, Xj) = E[X;Xj] — ELX;JE[X;] 
Now, 


1, if the ith and jth men both select their own hats 
0, otherwise 


and thus 


E[X;X;] = P(X; =1, Xj = 1) 
= P{X; = 1}P{X; = 1|X; = 1} 


_1 ii 
~NN-1 
Hence, 
ee ee 5 ae 1 
EE Mat) ANY REO 
and 
1 N 1 
X) = —— +2 
Var(X) N + =a 
JN 4 
ON N 
=1 
Chapter 3 
2. Intuitively it would seem that the first head would be equally likely to occur on any 
of trials 1,...,2— 1. That is, it is intuitive that 
1 
BPS 6 Si Ae) a i=1,...,n-1 
Formally, 


P{Xy =1,X, + X2 =n} 
P{X, + X2 =n} 

= P{X, =1,X2 =n—-i} 

~  P{X, + X2. =n} 

sep pissy: 
("7')pd - py"p 

1 
n—1 


P{X, =1|X,+ X2 =n = 
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In the preceding, the next to last equality uses the independence of X; and X2 to 
evaluate the numerator and the fact that X; + X> has a negative binomial distribu- 
tion to evaluate the denominator. 


partly PRERY =D 
P{1 white, 3 black, 2 red} 
i P{3 black} 
6 AS PO FON 
ami (7a) (=) (=) 
~ 6. /5\39/9\3 
ri) @) 
4 
ag 
vote 
27 
bey i= = 
9 
Co ee 
27 
5 
E[X yeu=s 


13. The conditional density of X given that X > 1 is 


fe) te 


Pxeie 7 when x > 1 


fxixs1(X) = 


CO 
E[X |X > nae | xhe* dx =14+1/a 
1 


by integration by parts. This latter result also follows immediately by the lack of 
memory property of the exponential. 


19. [ecxiy =sfvondy = ff staivexin defi dy 


f(x,y) 
— d d 
| “Fro fy) dy 


= [xfs ares 


= [ #f<o0 dx 
= E[X] 
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23. Let X denote the first time a head appears. Let us obtain an equation for E[N | X] 
by conditioning on the next two flips after X. This gives 


E[N |X] = E[N|X,h, h]p* + E[N|X,h,t]pq + E[N | X,t, hlpq 
+ EIN | X,t,t]q* 
where q = 1— p. Now 


EIN |X,h,h] =X +1, EIN|X,h,tq]=X+1 
E[N|X,t,h] =X +2, E[N|X,t,¢t] =X +24 E[N] 


Substituting back gives 
E[N|X] = (X + 1)? + pq) + (X +2) pq + (X +24 E[N) @? 

Taking expectations, and using the fact that X is geometric with mean 1/p, we obtain 
EIN] =1+p+4+2pq + q°/p + 2q° + q’E[N] 

Solving for E[N] yields 

2+24+4°/p 
1-q 


Ele*’}] = a / ‘ ef oO Wy) /2 gy 
JT J—oo 


E[N] = 


42. (a) 


exp{—(x? — 2px + pw? — 2tx*)/2dx 


1 [oe 
Sols 
a —oo 


. 7) ees | 
Thus, with o~ = y= 


oe) 
Efe*’] = —- ep / exp{—(x* — 20° ux)/20*}dx 
IT —0o 


Using that 
x? — 207 ux = (x —o7 pn) — uot 
we have 


1 CO 
Efe*’] = ew /2tuPo?/2_- / exp{—(x — o* 1)" /207}dx 
T J—co 


= eto utah a exp{—y*/207}d 
= =a Pi-y y 


= ge (l-07)u? /2 


a-29-o{ (1g | 


tu2 
= (1-22) Vern 
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n 
ep ayy sn/2 t 2 
= (1-2) of ty De] 
= 


( — 217"? =n — 2177/21 


2. 
= — 2t)7"/? = An(n/2 + 1) — 28)-"/2-7 


Hence, if x2 is chi-squared with degrees of freedom then evaluating the pre- 
ceding at t = 0 gives 


| 7 =n Var( x) =n 42n—n? =2n 


(d) Conditioning on K yields 


E [e"| =SE [eV IK = k| e~9/2.(6/2)* /k! 
k=0 
_ yd — 2p) (728/29 6/2 (9/7) 7h) 
k=0 


= (1 —20)-*/¢-9/? 5 “(4 — 22)-* 6/2) /k! 


k=0 
/2.,-6/2 6 i 
=(1—2177/2e7 > ' a ! 
(1 — 20) e (sq = =) je 
k=0 
0 6 
= (1 —2p 77/2 a Natt : 
= (1 — 2?) exp | 5+ aq} 
= = —n/2 tO 
= (1 - 22) op {| 


Because the preceding is the moment generating function of a noncentral chi- 
squared random variable with parameters 7 and 6, and the moment generating 
function uniquely determines the distribution, the result is proven. 

(e) From the preceding, we have 


E[WIK = k] = Elx7,>,] = + 2k 
Var(W|K = k) = Var(x2, 54) = 2n + 4k 
Hence, 


ELW] = E[E[W|K]] = E[n + 2K] =n + 2E[K] =n +0 
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and the conditional variance formula yields 
Var(W) = E[2n + 4K] + Var(m + 2K) = 2n + 20 + 20 = 2n 4+ 40 


47. E[X*Y?|X] = X7E[Y?|X] 
> X?(ELY|X])* = X? 
The inequality follows since for any random variable U, E[U2] > (E[U]})2 and this 
remains true when conditioning on some other random variable X. Taking expecta- 


tions of the preceding shows that 


E((XY)*] > E[X?] 


E[XY] = E[E[XY | X]] = ELXE[Y | X]] = E[X] 


the results follow. 


53. P{X=n}= i: P{X = nldje* da 
0 


[o.e) e*y” 
=f eda 
0 


n\ 


i di 
= i eth yn al 
0 n!\ 


= [Pes 1 - 
0 n!\ 2 


The results follow since fj? e~'t" dt =T(n+ 1) =n! 


60. 
(a) Intuitive that f(p) is increasing in p, since the larger p is the greater is the 
advantage of going first. 


(b) 1. 
(c) } since the advantage of going first becomes nil. 


(d) Condition on the outcome of the first flip: 


f(p) = PU wins | h}p + P{I wins|t}(1 — p) 
=pt+[1—f(p)ld—p) 


Therefore, 


1 
[OS 55 
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67. 


73. 


93. 


Part (a) is proven by noting that a run of j successive heads can occur within the 
first 7 flips in two mutually exclusive ways. Either there is a run of j successive 
heads within the first 7 — 1 flips; or there is no run of j successive heads within 
the first n — j — 1 flips, flip n — 7 is not a head, and flips 7 — 7 + 1 through 7 are 
all heads. 

Let A be the event that a run of j successive heads occurs within the first 1, (7 > /), 
flips. Conditioning on X, the trial number of the first non-head, gives the following 


Pi(n) = )- P(A|X = kyp*1(1 — p) 
k 


j oo 
= )°P(A|X=hp'-p)+ S> PAIX =kyp* 1 -p) 
k=1 k=j+1 


j oO 
=> P(n—kpk1a-p)+ > ptd-p) 


i=1 k=j+1 


j 
=> Pi(n— kyp* (1 - p) +p! 
i=1 


Condition on the value of the sum prior to going over 100. In all cases the most 
likely value is 101. (For instance, if this sum is 98 then the final sum is equally likely 
to be either 101, 102, 103, or 104. If the sum prior to going over is 95, then the final 
sum is 101 with certainty.) 


(a) By symmetry, for any value of (T,..., Tim), the random vector (11,..., Im) is 
equally likely to be any of the m! permutations. 
m 
(b) E[N] = }° E[N|X = iJP{X = i} 


i=1 


1 m 
— )° ENIX = i] 
fie i=1 


m—1 
1 
= ( S> (E[Ti] + EIN] + EIT, 1) 


i=1 
where the final equality used the independence of X and Tj. Therefore, 
m—1 


E[N] = E[Tm—1] + )) EIT] 
i=1 
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m—1 Py m-1 i i 
(i: RNS mei 

a Ete ee eee 
m—1 Wi m—1m-—-1 in 
tel] a ed 
m—1 mn m—1 m(m —j) 

= i T= 
jal m+ ] fal m J 
m—1 

_ m m(m —j) 

ae a 

= m(m-— 1) 


97. Let X be geometric with parameter p. To compute Var (X), we will use the conditional 
variance formula, conditioning on the outcome of the first trial. Let I equal 1 if the 
first trial is a success, and let it equal 0 otherwise. If J = 1, then X = 1; since the 
variance of a constant is 0, this gives 

Var(X|I = 1) =0 
On the other hand, if I = 0 then the conditional distribution of X given that I = 0 is 
the same as the unconditional distribution of 1 (the first trial) plus a geometric with 
parameter p (the number of additional trials needed for a success). Therefore, 
Var(X|I = 0) = Var(X) 
yielding 
E[Var(X|D] = Var(X|I = 1)PU = 1) + Var(X|J = 0)PU = 0) = (1 — p)Var(x) 


Similarly, 
E[X|I = 1] =1, FIX =0] = 14 EIX] = 145 
which can be written as 
1 
EIXUI= 1+ 50 -D 


yielding 


1 1 1-p 
Var(E[X|I]) = oe = ph p= > 
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The conditional variance formula now gives 


Var(X) = E[Var(X|I)] + Var(E[X|I]) 


= (1 = pyVan(x) + 2 
or 
Var(X) = = 
p 

Chapter 4 

1. Por=1, Pio= $s Poe 5, Psi =1 
Pu=%3, P2=%5 
Pip =§, Po3=8 


4. Let the state space be S = {0,1,2,0, 1,2}, where state i(i) signifies that the present 
value is i, and the present day is even (odd). 


16. If Pj were (strictly) positive, then Pe would be 0 for all 2 (otherwise, i and j would 
communicate). But then the process, starting in i, has a positive probability of at 
least Pj; of never returning to i. This contradicts the recurrence of i. Hence Pj; = 0. 


21. The transition probabilities are 


1-30, ifj=i 
1 |e, ifj #i 


By symmetry, 


1 Se Tiecs 
PR=31-PD, jf #i 


So, let us prove by induction that 


1430-40)" ifj=i 
"  [t-la-4ay iff Fi 
As the preceding is true for 7 = 1, assume it for n. To complete the induction proof, 
we need to show that 


i gt 30-40"! ifjsi 


iy lees 
4-41 —4a)"41 ifj Ai 
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27. 


Now, 


ne 
Pipe = PEPii + PEP hi 
idl 


(G+ + 50 40) "a —3a) +3(F- 70-4 y")a 
dg 
rigs 
ae 


(1 — 4a)"(1 — 3a — @) 


el —4q)"*1 


aa Al w 


= 


By symmetry, for j 4 i 
1 1 1 
P +1 P +1 +1 
i =a 5 rere ma 


and the induction is complete. 

By letting 2 — oo in the preceding, or by using that the transition probability 
matrix is doubly stochastic, or by just using a symmetry argument, we obtain that 
ay = 1/4,0 = 1,2, 3,4. 


(a) It is a Markov chain because each individual’s state the next period depends 
only on its current state and not on any information about earlier times. 

(b) If of the N individuals are currently active, then the number of actives in the 
next period is the sum of two independent random variables; R;, the number of 
the i currently active who remain active in the next period; and B;, the number 
of the N — i inactives who become active in the next period. Because R; is 
binomial (i, a), and B; is binomial (N — i, b), where b = 1 — B, we see that 


E[X,|X,-1 = i] = ia + (N-)A- pf) =NOA-6+ (a+ 6-Di 
Hence, 

E[Xy|Xn-1] = N( — 6) + @ + B-D)Xy-1 
giving that 

E[Xn] =N( — £) + @+ B- DE[Xp-1] 
Letting a = N(1 — f),b =a + B —1, the preceding gives 


E[X,] =a + bE[X,_1] 
=a+tb(a+ bE[X,_2]) =a+ bat b°E[X,_2] 
=a+ba+b?a+ bE[X,_3] 


Continuing this, we arrive at 


E[Xn] =a(1+b+--- +61) + b"E[Xo] 
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Thus, 
E[XalXo =iJ=a(1 +b+---45"1) 4 bi 


Note that 
a 1-8 
Ee eae 2 nage 


(c) With Rj, B; as previously defined 
P;, = P(R; + Bj =f) 


= ) > P(Rj + Bi = j|Ri = &) (i) a! (1 —a)i-* 

k 
= N-1 ik gN—i-j+k (4) i i-k 
E()o-oon (ean 


where (’") = Oifr<Oorr>m. 
(d) Suppose N = 1. Then, with 1 standing for active and 0 for inactive, the limiting 
probabilities are such that 
mo = m0B +7 (1—a@) 
wm =19(1— fp) + moa 


mot my =1 
Solving yields 
1-8 l-a 
m= ———,_ x) = ——__. 
1 2-a-Bp ‘ 2-a— 8 


Now consider the case of population size N. Because each member will, in steady 
state, be active with probability 7; and because each of the members changes 
states independently of each other it follows that the steady state number of 
actives has a binomial (N, 71) distribution. Hence, the long-run proportion of 
time that exactly j people are active is 


«n= (%)\(_1-8 ‘(1-0 \NT 
i a) ere Care), 


Note that the steady state expected number of actives is N 


l-a 


zy» in accord 


with what we saw in part (b). 


32. With the state being the number of on switches this is a three-state Markov chain. 
The equations for the long-run proportions are 


1 1 
to + 701 + — 72, 


sd ae Fs 4 16 


a 1s TU 
8 : 2 8 oe 


mo tm+72=1 


Ty, = 


Solutions to Starred Exercises 751 


This gives the solution 


41. (a) The number of transitions into state i by time ”, the number of transitions 
originating from state i by time m, and the number of time periods the chain 
is in state i by time 7 all differ by at most 1. Thus, their long-run proportions 
must be equal. 

(b) 2;Pj is the long-run proportion of transitions that go from state i to state j. 

c) 0, ;Pj is the long-run proportion of transitions that are into state j. 

(d) Since 7; is also the long-run proportion of transitions that are into state j, it 

follows that 2; = )°; 7;Pj. 


47. {Yn,n > 1} isa Markov chain with states (i, /). 


0, ifj#k 


Py, iff =k 


Pipa) = 


where P;; is the transition probability for {Xj}. 


lim P{Y, = @,/)} = lim P(X, =i, Xn =f} 
n—-> Ooo n 
= lim[P{X, = Pi] 
n 
= m,Pij 
62. It is easy to verify that the stationary probabilities are 2; = ay. Hence, the mean 
time to return to the initial position is 7 + 1. 


68. (a) > miQi = SS ai = Nj Se Ps =7j 
i i i 


(b) Whether perusing the sequence of states in the forward direction of time or in 
the reverse direction, the proportion of time the state is i will be the same. 


Chapter 5 


7. P{X, < X2| min(Xy, X2) = t} 


_ P(X < X2,min(Xq, X2) = ¢} 
P{min(X1, X2) = t} 


2 P{X; =t, X2 > t} 
~ P{X, =t,X2 >t} + P(X. =4,X, > 0 


. AML - Fro] 
AOA-hOl+ AOU - hl 
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10. 


Dividing through by [1 — F,(¢)][1 — F2(£)] yields the result. (Of course, fj and F; 
are the density and distribution function of X;, i = 1,2.) To make the preceding 


derivation rigorous, we should replace “= t” by € (¢,¢ + ¢) throughout and then 
let e > 0. 
(a) E[MX|M = X] = E[M2|M = X] 

= E[M’] 

a 2 

A+ Bw? 


(b) By the memoryless property of exponentials, given that M = Y, X is distributed 
as M + X’ where X’ is an exponential with rate A that is independent of M. 
Therefore, 


E[MX|M = Y] = E[M(M + X’)] 
= E[M*] + E[M]ELX’] 


ee eee eee 
Atm? AAt+H) 


A Le 
(c) E[MX] = E[MX|M = X]|——— + E[MX|M = Y]|——— 
[MX] [MX| ear [MX| 1 Gh 
_ 2+ 
~ AA + p)2 
Therefore, 
Cov(X,M) a 
OV > => — — 
AA + 2)? 


a) 1/(2p). 

(b) 1/(4:7), since the variance of an exponential is its mean squared. 

(c) and (d). By the lack of memory property of the exponential it follows that A, 
the amount by which X() exceeds X(1), is exponentially distributed with rate 
wand is independent of X(1). Therefore, 


1 1 
E[X ay] = E[Xq) + AJ= — + — 
(2) (1) on mn 
Var(X(2)) = Var(X(1) + A) ah + as 3 
ar = Var — 
@) wo 4u2 ps? 42 


) 
— 
NI 


(b) Gyr. Whenever battery 1 is in use and a failure occurs the probability is ; 
that it is not battery 1 that has failed. 

(ey <G yr hed, 

(d) T is the sum of 2 — 1 independent exponentials with rate 2 (since each time a 
failure occurs the time until the next failure is exponential with rate 2j.). 

(e) Gamma with parameters 2 — 1 and 2p. 
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36. 


L 


Nit) 
E[S(@)|N(t) = 2] = sE iIN@ = | 


ll 
a 


= s(ELX])” 
= s(1/p)” 
Thus, 


E[S@)] =s )0d/py"e™ ty" /n! 


= se S “(at/p)"/n! 


= se AttAt/m 


By the same reasoning 

E[S?(¢)|N@) = n] = s*(E[X?])” = s?(2/p?)" 
and 

E[S2()] = sre Tht 2at/w? 


40. The easiest way is to use Definition 5.3. It is easy to see that {N(t),t > 0} will also 
possess stationary and independent increments. Since the sum of two independent 
Poisson random variables is also Poisson, it follows that N(¢) is a Poisson random 
variable with mean (A + Az)¢. 


57e aye Bs 
(b) 2 P.M. 
(c) 1-—Se-4. 
60. (a) $. 
(b) 3. 
64. (a) Since, given N(f), each arrival is uniformly distributed on (0,1) it follows that 


t 
ELXIN®] =N@) / G52 NO. 
0 t 2 


(b) Let Uy, Uz,... be independent uniform (0, t) random variables. Then 


Var(X|N(t) = 7) = Var bx = v0] 
i=1 
2: 


t 
=n Var(Uj;) = na 
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(c) By parts (a) and (b) and the conditional variance formula, 


2 


12 
ae cae 
= + = 
4 12 3 
79. Consider a Poisson process with rate 4 in which an event at time ¢ is counted with 


probability 4(¢)/A independently of the past. Clearly such a process will have inde- 
pendent increments. In addition, 


P{2 or more counted events in (t,t + h)} 
< P{2 or more events in (t,t + h)} 
= o(h) 


and 


P{1 counted event in (t,t + h)} 
= P{1 counted | 1 event}P(1 event) 
+ P{1 counted | > 2 events}P{> 2} 


t+h x d. 

/ x9) & ah + off) + oh) 

Ba, | Oe 

Et 

= MO ss, + o(h) 
Xr 
= A()A + o(h) 

84. There is a record whose value is between ¢ and ¢ + dt if the first X larger than tf lies 
between ¢ and ¢ + dt. From this we see that, independent of all record values less 
than t, there will be one between t and t + dt with probability A(t) dt where A(£) is 
the failure rate function given by 


f(t) 
1—F@) 


A(t) = 


Since the counting process of record values has, by the preceding, independent incre- 
ments we can conclude (since there cannot be multiple record values because the X; 
are continuous) that it is a nonhomogeneous Poisson process with intensity function 
A(t). When f is the exponential density, A(t) = A and so the counting process of 
record values becomes an ordinary Poisson process with rate A. 

91. To begin, note that 


n 
P [x > | = P{X1 > Xo}P{X1 — Xo > X3|X1 > Xd} 
2 


x P{X, — X2 — X3 > X4|X1 > X2 + X3}--- 
x P{X1 — Xo Die — Xy-1 > Xy|X1 > X2+ 5% + Xy_-1} 


il n—-1 
= (5) by lack of memory 
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Hence, 
{a> Sxi-ml =yoP|x, > > xi = eT 
i=1 i=1 j#i 
Chapter 6 


2. Let Na(t) be the number of organisms in state A and let Ng(t) be the number of 
organisms in state B. Then {N,(¢), Ng(t)} is a continuous-Markov chain with 


Vingm} = an + Bm 


an 
Pnm),{n—1m+1} = rays 

Bm 
Pinm),jnt2,m—1} = = ih 


4. Let N(t) denote the number of customers in the station at time t. Then {N(f)} is a 
birth and death process with 


An = han, Un =e 
7. (a) Yes! 
(b) For n = (11,...5%j5i41,---,Mp—1) let 
Sj(m) = (m4,...,7; — 1, niga + 1,..-57p2-1), i=1,...,k-2 


Sp—1(n) a (m1, sey Mig Mi415--+5Mp—1 — 1), 
So(n) = (ny + ds ++9 Min NMi41>- ++ Mp_1). 


Then 


Qn, S;(n) = “ibs i=1,...,k-1 
Qn,So(n) = A 
11. (b) Follows from the hint about using the lack of memory property and the fact 
that ¢;, the minimum of j — (i — 1) independent exponentials with rate A, is 


exponential with rate (( —i—1)A. 
(c) From parts (a) and (b) 


P{Ty +--+ T; <1} =P{ max X; < | = d—e%*y 
_ S/] 


(d) With all probabilities conditional on X(0) = 1, 
Pit) = P{X@® =7} 
P{X(£) Sf} — P{X(t) 7 + 1 
=P{Tlj +--+ 7, <9-P{M4+---4+ Tj <B 
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16. 


19. 


24. 


(e) The sum of i independent geometrics, each having parameter p = e~““, is a 


negative binomial with parameters i,p. The result follows since starting with 
an initial population of i is equivalent to having i independent Yule processes, 
each starting with a single individual. 


Let the state be 


2: an acceptable molecule is attached 
0: no molecule attached 
1: an unacceptable molecule is attached. 


Then, this is a birth and death process with balance equations 
iP, =A(1 — a) Po 
M2 P2 = rAaPo 


Since par P; = 1, we get 


l-a = da. 
Py ma _ MI 


LL 
Pp =]1+ = 
- | hoy + Pipe + ACL — a) 2 


ra a py 


where P? is the percentage of time the site is occupied by an acceptable molecule. 
The percentage of time the site is occupied by an unacceptable molecule is 


oe AC = @) 2 
a py holy + Mab2 + ACL — @)p2 


Py 


There are four states. Let state 0 mean that no machines are down, state 1 that 
machine 1 is down and 2 is up, state 2 that machine 1 is up and 2 is down, and state 
3 that both machines are down. The balance equations are as follows: 


(Aq + A2)Po = Hi Pi + w2P2 

(Hy + A2)P1 = A1Po 

(Aq + u2)P2 = A2Po + wi P3 

H1P3 =A2P1 + A1P2 
Po + Pi +P2+P3=1 

The equations are easily solved and the proportion of time machine 2 is down is 
Po + P3. 
We will let the state be the number of taxis waiting. Then, we get a birth and death 
process with A, = 1, 4, = 2. This is an M/M/1. Therefore: 
(a) Average number of taxis waiting = —s - — = 
(b) The proportion of arriving customers that gets taxis is the proportion of arriv- 


ing customers that find at least one taxi waiting. The rate of arrival of such 
customers is 2(1 — Po). The proportion of such arrivals is therefore 


21 -P r Xr 1 
CEO) 4 Peat (1 )- = 
2 a 


1. 
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28. 


33. 


Let Pi v* denote the parameters of the X(t) and Pi, v? of the Y(t) process; and let 


*,P;, respectively. By independence we have that for 
the Markov chain {X(¢), Y(¢)} its parameters are 


the limiting probabilities be P*, P” 


Yih = YE + Ys 


PuDGD = = 
Pung. = —~— 
(DG) = 5 


and 
lim P((X(@), Y@)) = Gf} = PEP; 
too 
Hence, we need to show that 
PX P) y* PX — P* PY y* Px 
iar i or a oo oe ie 


(That is, the rate from (i, /) to (j, 1) equals the rate from (j, /) to (i, /).) But this follows 
from the fact that the rate from i to j in X(t) equals the rate from j to i; that is, 


XX PX __ px, x px 
bE a Lee a 


The analysis is similar in looking at pairs (i, /) and (i, k). 
Suppose first that the waiting room is of infinite size. Let X;(t) denote the number of 
customers at server i,i = 1,2. Then since each of the M/M/1 processes {Xj (t)} is time 
reversible, it follows from Exercise 28 that the vector process {(X1(t), (X(t)),t > 0} 
is a time reversible Markov chain. Now the process of interest is just the truncation 
of this vector process to the set of states A where 

A= {(O,m):m <4}U{(7,0): 2 < 4} U {(n,m): nm > 0,n +m < 5} 


Hence, the probability that there are 2 with server 1 and m with server 2 is 


M1 M1 M2 M2 
Ar \" (a0 \™ 
=c(*) (2) 7 (njmyeA 
M1 2 
The constant C is determined from 


Pave St 


where the sum is over all (7,77) in A. 
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42. (a) The matrix P* can be written as 
P*=14R/v 


and so Pi can be obtained by taking the 7,7 element of (I + R/v)”, which gives 


the result when v = n/t. 

(b) Uniformization shows that Pj(t) = E[P:N], where N is independent of the 
Markov chain with transition probabilities Py and is Poisson distributed with 
mean vt. Since a Poisson random variable with mean vt has standard deviation 
(vt)'/2, it follows that for large values of vt it should be near vt. (For instance, 
a Poisson random variable with mean 10° has standard deviation 10° and thus 
will, with high probability, be within 3000 of 10°.) Hence, since for fixed i and 
i biog should not vary much for values of 7 about vt where vt is large, it follows 
that, for large vt, 


E[P3N ]* Pi" where 1 = vt 


Chapter 7 


3. By the one-to-one correspondence of m(t) and F, it follows that {N(t),t > 0} is a 
Poisson process with rate - Hence, 


P{N(5) = 0} = e 9/7 
6. (a) Consider a Poisson process having rate 4 and say that an event of the renewal 
process occurs whenever one of the events numbered r, 27, 3r,... of the Poisson 


process occurs. Then 


P{N(t) > n} = P{ur or more Poisson events by ¢} 


= DVe*ani/il 


(b) EIN@] = D)PIN® >} = 0 De “Manisa 
n=1 =1i=nr 
oo [i/r] ioe) ; 
=P Ve asy/il = bile (aty'/i! 
i=r n=1 i=r 


8. (a) The number of replaced machines by time ¢ constitutes a renewal process. The 
time between replacements equals T, if the lifetime of new machine is > T;x, if 
the lifetime of new machine is x,x < T. Hence, 

T 
E[time between replacements] = / xf (x) dx + T[1 — F(T)] 
0 


and the result follows by Proposition 3.1. 
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18. 


(b) The number of machines that have failed in use by time t constitutes a renewal 


process. The mean time between in-use failures, E[F], can be calculated by con- 
ditioning on the lifetime of the initial machine as E[F] = E[E[F| lifetime of initial 
machine]]. Now 


E[F | lifetime of machine is x] = te + E[Fl, rk ie 7 
Hence, 
T 
BUFI= f xf de + (T+ ELD - FCT) 
0 
or 
T 
ELF) = Jo xf(x) dx + T[1 — F(T)] 


F(T) 
and the result follows from Proposition 3.1. 


We can imagine that a renewal corresponds to a machine failure, and each time a 
new machine is put in use its life distribution will be exponential with rate 1 with 
probability p, and exponential with rate j42 otherwise. Hence, if our state is the 
index of the exponential life distribution of the machine presently in use, then this is 
a two-state continuous-time Markov chain with intensity rates 


12 =m —p), 92,1 = 2p 


Hence, 
11 — p) 
Pi1(t) = 1 + t 
11) ae eT exp{—[w1(1 — p) + Haplt} 
2p 


Hi(1—p) + pap 


with similar expressions for the other transition probabilities (P}2(¢) = 1 — Py1(), 
and P(t) is the same with w2p and 441(1 — p) switching places). Conditioning on 
the initial machine now gives 


ELY(@] = pELY@|X(0) = 1] + (1 — pyELY@|X(0) = 2] 


=p [=u a 20] peep) ee Ee 20] 
M1 2 M1 M2 


Finally, we can obtain m(t) from 
u[m(t) + 1] =t + E[Y(@)] 
where 
= p/u1 t+ 1 — p)/H2 


is the mean interarrival time. 
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22. Cost of acycle = Cy + CoI- R(T) -—D 
1ST = 
I= 6 fX>T where X = life of car 
Hence, 
E[cost of a cycle] = Cy + C2H(T) — R(T)[1 — H(T)] 
Also, 
E[time of cycle] = [Evie | X = x]h(x) dx 
T 
= [ xh(x) dx + T[1 — H(T)] 
0 
Thus the average cost per unit time is given by 
Cy + C,H(T) — R(T)[1 — H(T)] 
Jo. xh(x) dx + TI — H(T)] 
30, A® _ to Sno 
t t 
a Sn) 
i 
_ Sn N@) 
N(t) ¢ 
The result follows since Syy/N(t) > (by the strong law of large numbers) and 
N(t)/t > 1/u. 
35. (a) Wecan view this as an M/G/oo system where a satellite launching corresponds 


to an arrival and F is the service distribution. Hence, 
P{X(t) =k} = ORO! 


where A(t) = 2 fj (1 — F(s)) ds. 
(b) By viewing the system as an alternating renewal process that is on when there 
is at least one satellite orbiting, we obtain 


lim P{X(f) = 0} = pre Lies 
1/A + E[T] 
where T, the on time in a cycle, is the quantity of interest. From part (a) 
lim P{X (¢) = 0} =e" 
where uw = Ie (1 — F(s)) ds is the mean time that a satellite orbits. Hence, 


ile 1/r 
~ 1/A + E[T] 
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42. 


49. 


so 
1 —e 


Se a 


1 x 
(a) Fela) == f e/H dy =1—e*/#, 
KL Jo 


1 & a 
(b) Foca) = = f piso, O<xK<e. 
Cc Cc 


(c) You will receive a ticket if, starting when you park, an official appears within one 
hour. From Example 7.23 the time until the official appears has the distribution 
F, which, by part (a), is the uniform distribution on (0,2). Thus, the probability 
is equal to 7 


Think of each interarrival time as consisting of 7 independent phases—each of which 
is exponentially distributed with rate A—and consider the semi-Markov process 
whose state at any time is the phase of the present interarrival time. Hence, this 
semi-Markov process goes from state 1 to 2 to 3... tom to 1, and so on. Also the 
time spent in each state has the same distribution. Thus, clearly the limiting probabil- 
ity of this semi-Markov chain is Pj = 1/n, i=1,..., 2. To compute lim P{Y(t) < x}, 
we condition on the phase at time ¢ and note that if it is 7 — i + 1, which will be 
the case with probability 1/n, then the time until a renewal occurs will be sum of 
i exponential phases, which will thus have a gamma distribution with parameters i 
and i. 


Chapter 8 


2. 


This problem can be modeled by an M/M/1 queue in which 4 = 6, uw = 8. The 
average cost rate will be 


$10 per hour per machine x average number of broken machines 


The average number of broken machines is just L, which can be computed from 
Equation (3.2): 


Hence, the average cost rate = $30/hour. 


7. Tocompute W for the M/M/2, set up balance equations as follows: 


APo = uP, (each server has rate 1) 
(A+ w)Py =APo + 2uP> 
A+ 2W)Pn =APya t+ 2uPpsvi, n22 


762 Solutions to Starred Exercises 


These have solutions P, = p”/2"—!Po where p = A/j. The boundary condition 
ye 9 P, = 1 implies 


Ee 1— p/2 _ (2 — p) 
1+p/2 (24+ 0) 


Po 


Now we have P,, so we can compute L, and hence W from L = AW: 


CO CO n—1 
L= Yo #Pn = pPo dn (F) 


n=0 n=0 
2S H(8) 
n=0 
= s - “ a aE (See derivation of Equation (8.7).) 
sites Se = 
~ (2+ p)(2-p) 
4ur 


(Qu +A)2m — a) 
From L = AW we have 


4u 


W = W(M/M/2) = 
esa? Qu +A)Qu —A) 


The M/M/1 queue with service rate 24 has 


1 


W(M/M/1) = es 


from Equation (8.8). We assume that in the M/M/1 queue, 2/4 > 4 so that the queue 
is stable. But then 4 > 2u + A, or 44/(2 + A) > 1, which implies W(M/M/2) > 
W(M/M/1). The intuitive explanation is that if one finds the queue empty in the 
M/M/2 case, it would do no good to have two servers. One would be better off with 
one faster server. Now let Wo = Wo(M/M/1) and wig Wo(M/M/2). Then, 


Wo = W(M/M/1) - 1/2" 
Wo = W(M/M/2) — 1/p 
So, 


x 
Whe from Equation (8.8) 
2 2uQu—A) i 
and 


2 ae 
W. — 
OQ wQu—Aay2u +a) 
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Then, 
1 Xr 
Wee ws - >. 
OP Oa iy ay 
A<2p 


Since we assume A < 2y for stability in the M/M/1 case, Wo < Wo whenever this 
comparison is possible, that is, whenever A < 2w. 
13. (a) Po = uP 
(A + p)Py = APo + 2uP2 
A+ 2u)Pp = APy-1 + 2uP sis ne 


These are the same balance equations as for the M/M/2 queue and have solution 


2u-%7 x” 
Po= P, = ———P 


(b) The system goes from 0 to 1 at rate 
_ A(2p — dA) 
ona) 

The system goes from 2 to 1 at rate 


_ 4 Qu-a) 


2uP, = — 
re bw (Qu +A) 


(c) Introduce a new state cl to indicate that the stock clerk is checking by himself. 
The balance equation for P.; is 
(A+ “)P 1 = wP2 
Hence, 


__4 pe Mw (2u—A) 
A+ 2uQ +“) Qu +2) 


Py 


Finally, the proportion of time the stock clerk is checking is 


22 


(oe) 


n=2 


21. (a) AyPio. 
(b) Az(Po + Pio). 
(c) AqPio/[A1 Pio + A2(Po + Pio)]- 
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(d) This is equal to the fraction of server 2’s customers that are type 1 multiplied 
by the proportion of time server 2 is busy. (This is true since the amount of time 
server 2 spends with a customer does not depend on which type of customer it 
is.) By (c) the answer is thus 


(Po. + Pi1)A1 P10 
AtPio + A2(Po + Pio) 


24. The states are now n,n > 0, and n’,n > 1 where the state is m when there are in the 
system and no breakdown, and 7’ when there are 7 in the system and a breakdown 
is in progress. The balance equations are 


APo = bP 
(A+ w+ o)Py = APp-1 + UPn4i + BPw, n>1 
(B+ A)Py = aPy 
(B+ A)Py = aPy + AP in-1)y'5 n>2 


[o.e) o.e) 
Pet > Py a1 
n=0 n=1 


In terms of the solution to the preceding, 


[oe 
L= 0 aPn + Pu) 


n=1 
and so 
L L 
W — 
id 


28. Ifa customer leaves the system busy, the time until the next departure is the time of 
a service. If a customer leaves the system empty, the time until the next departure is 
the time until an arrival plus the time of a service. 

Using moment generating functions we get 
sD A sD 
Ble se ra | system left busy} 


Xr 
+4 (1 = *) E{e | system left empty} 
i” 


ad | rer) ar ca 
uw} \u-s ue 


where X has the distribution of interarrival times, Y has the distribution of service 
times, and X and Y are independent. Then 


Efe“) = Efes*es"] 
= Efe*]E[es"] by independence 


elu a) 
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36. 


39. 


So, 


By the uniqueness of generating functions, it follows that D has an exponential 
distribution with parameter 4. 

The distributions of the queue size and busy period are the same for all three disci- 
plines; that of the waiting time is different. However, the means are identical. This 
can be seen by using W = L/A, since L is the same for all. The smallest variance in 
the waiting time occurs under first-come, first-served and the largest under last-come, 
first-served. 

(a) ao = Po due to Poisson arrivals. Assuming that each customer pays 1 per unit 

time while in service the cost identity of Equation (8.1) states that 


average number in service = AE[S] 
or 
1— Po =AE[S] 


(b) Since ag is the proportion of arrivals that have service distribution G; and 1—ag 
the proportion having service distribution G2, the result follows. 
(c) We have 


Et] 


°~ EL] + E[B] 


and E[I] = 1/a and thus, 


1 Py 
APo 
E[S] 
~ T—AE[S] 


E[B] = 


Now from parts (a) and (b) we have 
E[S] = (1 — AELS))E[S,] + AELSJE[S] 


or 


E[S1] 
1+ AE[S;] + AE[S2] 


E[S] = 


Substituting into E[B] = E[S]/(1 — AE[S]) now yields the result. 
(d) ag = 1/E[C], implying that 


E[Si] + 1/A — E[S2] 


FLCI= 1a Els3] 
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45. Byregarding any breakdowns that occur during a service as being part of that service, 
we see that this is an M/G/1 model. We need to calculate the first two moments of a 
service time. Now the time of a service is the time T until something happens (either 
a service completion or a breakdown) plus any additional time A. Thus, 


E[S] = E[T + A] 
= E[T] + E[A] 


To compute E[A], we condition upon whether the happening is a service or a break- 
down. This gives 


E[A] = E[A | service] i + E[A | breakdown] = 
h+a b+a 
= E[A| breakdown] - 
h+a 


1 a 
=(+E 
(5+ (81) 


Since E[T] = 1/(a@ + ) we obtain 


1 1 a 
BIS] = —— + (5 + £151) 


or 


eee 
hw pp 
We also need E[S], which is obtained as follows: 
E[S*] = E[(T + A)” 
= E[T7] + 2E[AT] + E[A7] 
= E[T7] + 2E[A]E[T] + E[A7] 


The independence of A and T follows because the time of the first happening is 
independent of whether the happening was a service or a breakdown. Now, 


a 
U+a 
E[(downtime + S*)?] 


E[A?] = E[A? | breakdown] 


a 
uta 


Trg (Eldown"y + 2E[down]E[S] + E[S7}} 


a 2 271 a 
+ + + £181} 
Cale a a oe 


Hence, 


2 a a 1 a 
E[S?] = +2/ + ( + )] 
a (uw + B)2 Buta) ptalw pup 


a 2 271 a 
+ + + B1S"I| 
cealp ala ma 
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52. 


Now solve for E[S*]. The desired answer is 


__ AETS?] 
O30 — AFIS) 


In the preceding, S* is the additional service needed after the breakdown is over and 
S* has the same distribution as S. The preceding also uses the fact that the expected 
square of an exponential is twice the square of its mean. 

Another way of calculating the moments of S$ is to use the representation 


N 


S= 07; +B) + Tra 
i=1 


where N is the number of breakdowns while a customer is in service, T; is the time 
starting when service commences for the ith time until a happening occurs, and B; is 
the length of the ith breakdown. We now use the fact that, given N, all of the random 
variables in the representation are independent exponentials with the T; having rate 
j +a and the B; having rate B. This yields 


N+1 N 

E[S|N] = ay 
[S|N] jaa 6 
N+1 N 
N) = ——~ + — 
Var(S|N) Geel eR 


Therefore, since 1+N is geometric with mean (u+a)/u (and variance 
a(a + )/”) we obtain 


a 
+ — 


1 
E[S] = — 
el wh BB 


and, using the conditional variance formula, 


1 1}? 1 
varis) = | —— +5] “5 +, 
H+a 8B Mh w(e+a) wp 
Sn is the service time of the mth customer; T,, is the time between the arrival of the 
nth and (m + 1)st customer. 


Chapter 9 


4. 


(a) P(x) = x4 max(x2,X3,X4)x5. 
(b) P(x) = x1 max(x2x4, x3x5)x6. 
(c) (x) = max(x1,x2%3)x4. 


A minimal cut set has to contain at least one component of each minimal path set. 
There are six minimal cut sets: {1, 5}, {1, 6}, {2, 5}, {2, 3, 6}, {3, 4, 6}, {4, 5}. 
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12. The minimal path sets are {1, 4}, {1, 5}, (2, 4}, {2, 5}, {3, 4}, (3, 5}. With gj = 1—- pi, 
the reliability function is 


r(p) = P{either of 1,2, or 3 works}P{either of 4 or 5 works} 
= (1 — 919293) — 9445) 
17. E[N*] = E[N?|N > O]P{N > 0} 
> (EININ > 0))7P{N > 0}, since E[X*] > (E[X])? 
Thus, 
E[N7]P{N > 0} > (E[NIN > OJP[N > 0])* 
= (E[N])* 
Let N denote the number of minimal path sets having all of its components function- 
ing. Then r(p) = P{N > 0}. Similarly, if we define N as the number of minimal cut 
sets having all of its components failed, then 1 — r(p) = P{N > 0}. In both cases we 
can compute expressions for E[N] and E[N?] by writing N as the sum of indicator 


(i.e., Bernoulli) random variables. Then we can use the inequality to derive bounds 


on 7r(p). 
22. (a) F,(a)=P{X>t+a|X >} 


_ PIX>tt+a}_ F(t +a) 
—  P{X>th FL) 


(b) Suppose A(z) is increasing. Recall that 


Oe Jo (s) ds 


Hence, 


F(t +a) i | 
= — = Xs) d: 
Fo exp | : (s) ds 


which decreases in ¢ since A(t) is increasing. To go the other way, suppose 
F(t + a)/F(t) decreases in t. Now when a is small 
F(t +a) we ena) 
F(t) 


Hence, e~” must decrease in ¢ and thus A(£) increases. 


25. Forx > &, 
1 — p = Fé) = F(x(E/x)) > [FO* 


since IFRA. Hence, F(x) < (1 — p)*/§ = e~®. 
For x < é, 


F(x) = F(&(x/é)) > [F@P 
since IFRA. Hence, F(x) > (1 — p)*/ =e. 
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30. r(p) = pip2p3 + pip2p4 + p1ip3p4 + p2p3P4 — 3p1p2~3~4 
20-121 -—t/2) + 20 — (1 — t/2)? 
r(1 — F(t) = }-30. — 4)2(1 — t/2)?, O0<t<1 
0, ces 
1 
E[lifetime] = i [21 — #74 —¢/2) + 20 — 1 — 2/2)" 
0 
— 3(1—2)7(1 — t/2)*] dt 
31 
~ 60 
Chapter 10 
1. B(s) + Bt) = 2B(s) + B(t) — B(s). Now 2B(s) is normal with mean 0 and variance 


10. 


4s and B(t) — B(s) is normal with mean 0 and variance t — s. Because B(s) and 
B(t) — B(s) are independent, it follows that B(s) + B(t) is normal with mean 0 and 
variance 4s + t—s=3s+t. 


E[B(t1)B(¢2)B(t3)] = ELE[B(t1) B(tz)B(t3)|BCe1), Betz) I] 

= E[B(t1)B(t2) E[B(t3) | B(t1), Bia) 1] 

= E[B(t1)B(t2)B(t2)] 

= E[E[B(t1)B? 2) | B(t1)1 

= E[B(t1)E[B? (2) | B(t1)1 

= E[B(ti){(t2 — t1) + BH) 

= E(B’ (t1)| + (2 — 4) ELBA) 

=0 
where the equality (*) follows since given B(t,), B(t2) is normal with mean B(t,) and 
variance t) — ty. Also, E[B?(t)] = 0 since B(t) is normal with mean 0. 
P{T, < T_1 < T2} = P{hit 1 before —1 before 2} 

= P{hit 1 before —1} 

x P{hit —1 before 2| hit 1 before —1} 


3 P{down 2 before up 1} 
11 1 


The next to last equality follows by looking at the Brownian motion when it first 
hits 1. 


(a) Writing X(t) = X(s) + X(t) — X(s) and using independent increments, we see 
that given X(s) = c, X(f) is distributed as c + X(t) — X(s). By stationary 
increments this has the same distribution as c + X(t — s), and is thus normal 
with mean c + w(t — s) and variance (t — s)o?. 
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19. 


20. 


24. 


(b) Use the representation X(t) = oB(t) + yt, where {B(t)} is standard Brownian 
motion. Using Equation (10.4), but reversing s and t, we see that the condi- 
tional distribution of B(¢) given that B(s) = (c — s)/o is normal with mean 
t(c — ws) /(os) and variance t(s — t)/s. Thus, the conditional distribution of X(t) 
given that X(s) = c,s > tf, is normal with mean 


| ae) re (c — ps)t ns 
os 


s 
and variance 


o*t(s —t) 
Ss 


Since knowing the value of Y(t) is equivalent to knowing B(t), we have 
ELY(@)|Y(), 0<u<s] =e" E[e® | Baw), 0 <u <s] 
a eC ZF eBO | B(s)] 


Now, given B(s), the conditional distribution of B(t) is normal with mean B(s) and 
variance t — s. Using the formula for the moment generating function of a normal 
random variable we see that 


e © t/2 EfeBO| B(s)] = oO t/2 gcB(s)+(t—-s)e? /2 
= 75/2 ocBis) 
= Y(s) 
Thus {Y(¢)} is a Martingale. 
E[Y(@)] = E[Y(0)] = 1 
By the Martingale stopping theorem 


E[B(T)] = E[B(O)] = 0 


However, B(T) = 2 — 4T and so 2 — 4E[T] = 0, or E[T] = 4 
It follows from the Martingale stopping theorem and the result of Exercise 18 that 


E[B?(T) — T] =0 
where T is the stopping time given in this problem and 


B(t) = Aon 


Therefore, 


= 2 
- Ee uty r| rth 


o2 
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However, X(T) = x and so the preceding gives that 
El — uT)"] = o° EIT] 
But, from Exercise 21, E[T] = x/ and so the preceding is equivalent to 


Var(uT) = oe or Var(T) = es 
in im 


27. E[X(a2t)/a] = (1/a)E[X(a2t)] = 0. For s < t, 
1 


Cov(Y(s), Y()) Cov(X(a’s), X(a*t)) 


qa 
1 
a's =s 
a 


Because {Y(t)} is clearly Gaussian, the result follows. 


30. (a) Starting at any time ¢ the continuation of the Poisson process remains a Poisson 
process with rate A. 


CO 
ib) EVOYe+o= f° EYOYE + 91Y@ = ye dy 
0 
= i yE[Y(t+ s)| YO = y]ae*? dy 
e CO 
+ / y(y — s)\ae*Y dy 
sy "i ioe) = 
=a yrae dy + f y(y — s)Ae*Y dy 
0 s 
where the preceding used that 


yE(Y(t+ s)) = o ify <s 


ELY®YE+ s|Y@) =y)] = | 
y(y —s), ify >s 


Hence, 


Ss CO 1 
Cov(Y(t), Y¢+s)) = i ye * dy + / y(y — sae dy — 2 
0 s 


Chapter 11 
1. (a) Let U be a random number. If YS P,<UK viet P; then simulate from F;. 
(In the preceding bar P; = 0 when i = 1.) 
(b) Note that 


F = Li 2 
(x) = 3 1(x) + 3 2(x) 
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where 
Fu(x) =1-e%, 0<x<0o 


x, O<x<l 
FS as tea 


Hence, using part (a), let Uj, Uz, U3 be random numbers and set 


—logUz . 
ai te 3 if Uy < 


U3, if U, > 


Wl Wl 


The preceding uses the fact that — log U2/2 is exponential with rate 2. 
3. Ifarandom sample of size v is chosen from a set of N + M items of which N are 
acceptable, then X, the number of acceptable items in the sample, is such that 


rea) 


To simulate X, note that if 


L= [ if the jth selection is acceptable 
os 


0, otherwise 
then 
N-D hi 
ALbetiis. bah 
{]j lh, Tj_-1} NEM—G— 1D 


Hence, we can simulate I1,...,I, by generating random numbers Uj,..., Uy and 
then setting 


Ne | 
1g. WO 
hay? "1" N4M-G-D 
0, otherwise 


and X = )°"_, Jj has the desired distribution. 
Another way is to let 


1, the jth acceptable item is in the sample 


X= ; 
0, otherwise 
and then simulate X1,..., Xj by generating random numbers U;,..., Un and then 
setting 
j-1 
n—- >", X; 
1, if U; < ate Ei 
x= NM Gad) 


0, otherwise 


and X = bara X; then has the desired distribution. 
The former method is preferable when 7 < N and the latter when N < n. 
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6. Let 


Hence (d/di)c(A) = 0 when A = 1 and it is easy to check that this yields the minimal 
value of c(A). 


16. (a) They can be simulated in the same sequential fashion in which they 
are defined. That is, first generate the value of a random variable I; 


such that 
, Wj ‘ 
Pik =} = ~— i=1,...,n” 
Be) Wj 
Then, if I; = k, generate the value of I, where 
Pib=i}= a, igh 
Dy zb Wj 


and so on. However, the approach given in part (b) is more efficient. 
(b) Let J; denote the index of the jth smallest Xj. 


23. Let m(t)= fj a(s)ds, and let m7'(t) be the inverse function. That is, 
m(m—'(t)) = t. 
(@) P(m(X1) > x} = P(X1 > m~!(x)} 
= P{N(m~!(x)) = 0} 
= eo (m"(x)) 


= e~ 


(b)  Pfrm(Xj) — m(Xj-1) > xlm(X1),-..,(Xj-1) — m(Xj_2)} 
= P{m(Xj) — m(Xj-1) > x|X1,-.-,Xi-1} 
= P{m(Xj) — m(Xj-1) > x|Xi-1} 
= P{m(Xj) — m(Xj-1) > x|m(Xi-1)} 
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Now, 


P{m(X;) — m(Xj-1) > x|Xj-1 = y} 


Xj 
—p {f A(t)dt > x|Xj_-1 = | 
y 


= P{X; > c|Xj-1 = y} where [ A(t) dt = x 
y 


= P{N(c) — N(y) = 0/ Xi-1 = 9} 
= P{N(c) — N(y) = 0} 


= exp {- [a ar} 
y 


= e~ 


32. Var[(X + Y)/2] = q[Var(X) + Var(Y) + 2Cov(X, Y)] 


= Var(X) + Cov(X, Y) 
- 2 


Now it is always true that 


Cov(V, W) 
J/Var(V)Var(W) v 


and so when X and Y have the same distribution Cov(X, Y) < Var(X). 
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Absorbing state of a Markov chain, 194 
Accessibility of states in a Markov chain, 204 
Age of the renewal process, 452-453 
Algorithmic efficiency, 234-237 

Algorithms, analyzing probabilistic, 237-241 
Alias method in simulation, 691-695 

Aloha protocol, 211-214 

Alternating renewal process, 450-451 
Antithetic variables, 707-710 

Aperiodic state of a Markov chain, 214 
Arbitrage, defined, 640 

Arbitrage theorem, 640-641 

Arrival theorem, 534-535 

Availability, system, 616 


B 
Balance equations, 392, 504 
Balking, M/M/1 queueing system, 517 
Ballot problem, 130-132, 185 
Bayes’ formula, 12-15 
Bernoulli random variables, 26-27, 36-37, 47, 
54-55, 296, 719 
independent, 125-126 
Best prize problem, 126-127 
Beta distribution, 685-686 
Beta random variable, 61 
relation to gamma, 60-61 
simulation of, 675, 685-686 
Binomial random variables, 27-29, 37, 47 
simulating, 690 
sums of independent, 67 
variance of, 54 
Binomials, negative, 172-173 
Birth and death model, 371 
Birth and death processes, 374-381 
ergodic, 399 
forward equations for, 390 
Birth and death queueing models, 517-522 
Bivariate exponential distribution, 368 


Bivariate normal distribution, 174 
Bivariate Poisson process, 360 
Black-Scholes option pricing formula, 644-649 
Bonferroni’s inequality, 16 
Bonus Malus automobile insurance system, 
194-195, 229-230 
Boole’s inequality, 16 
Bose-Einstein statistics, 149-153 
Box-Muller approach, 682 
Branching processes, 245-249 
Bridge 
Brownian, 652 
structure, 594-595 
system defined, 584 
Brown, Robert, 632 
Brownian bridge, 652 
Brownian motion 
integrated, 653 
standard, 633 
variations on, 636-638 
Brownian motion and stationary processes, 
631-666 
Gaussian processes, 651-654 
harmonic analysis of weakly stationary 
processes, 659-661 
hitting times and gambler’s ruin problem, 
635-636 
maximum variable and gambler’s ruin 
problem, 635-636 
pricing stock options, 638-649 
stationary and weakly stationary processes, 
654-659 
variations on Brownian motion, 636-638 
white noise, 649-651 
Busy period, 347, 454-455, 540-541, 558 


C 


Cayley’s theorem, 597 
Central limit theorem, 80, 83-84 
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Central limit theorem for renewal processes, 
438-439 
Chapman-Kolmogorov equations, 195-204, 
385 
Chebyshev’s inequality, 78-79 
Chi-squared distribution, 684-685 
Chi-squared random variable with n degrees of 
freedom, 72, 180 
Class of Markov chain, 205 
Closed class of a Markov chain, 277 
Communications with Markov chain, 204 
Complement of event, 3 
Compound Poisson process, 346-351 
Compound Poisson random variable, 120 
Compound random variables 
defined, 109, 167 
identities, 167-169 
variances of, 119-120 
Compound random variables, identity for, 
166-173 
binomial compounding distribution, 171 
compounding distribution related to negative 
binomials, 172-173 
Poisson compounding distribution, 169-170 
Compounding distribution, 167 
Conditional expectation, 98, 103, 106-107 
Conditional expectation, conditional 
probability and, 97-189 
computing expectations by conditioning, 
106-121 
computing probabilities by conditioning, 
122-139 
continuous case, 102-106 
discrete case, 97-102 
identity for compound random variables, 
166-173 
miscellaneous applications, 140-166 
Conditional or mixed Poisson processes, 
351-354 
Conditional probability, 7-10, 139 
Conditional probability and conditional 
expectation 
computing expectations by conditioning, 
106-121 
computing probabilities by conditioning, 
122-139 
continuous case, 102-106 
discrete case, 97-102 
identity for compound random variables, 
166-173 
miscellaneous applications, 140-166 


Conditional probability density function, 
102-103 
Conditional probability mass function, 98-99 
Conditional variance, 119 
Conditional variance formula, 119 
Conditioning, computing probabilities by, 
122-139, 199-200 
Connected graph, defined, 147 
Continuous random variables, 24, 31-36 
Continuous-time process, 84 
Convolution, 56 
Correlation, 175 
Counting processes, 312-313 
Coupling from past, 723-725 
Coupon collecting problem, 322-325 
Covariance 
in M/M/1 queueing system, 569 
properties of, 52-53 
and variance of sums of random variables, 
50-59 
Coxian random variables, 311-312 
Craps, 16 
Cumulative distribution function (cdf), 24, 26 
Cut vector, defined, 584 


D 

Debugging software, 337 

Decreasing failure rate (DFR), 603-604, 
606-607 

Delayed renewal process, 466, 486 

Dependent events, 10 

Dirichlet distributions, defined, 151 

Discrete distributions simulating from, 688-695 

Discrete random variables, 24 

Discrete-time process, 84 

Distribution function, 24 

Distributions, simulating from discrete, 
688-695 

Doubly stochastic, 278 


E 

e, 132-133 

Ehrenfest, P. and T., 252 

Ehrenfest urn model, 252, 284 
Einstein, A., 633 

Elementary renewal theorem, 432-433 
Equilibrium distribution, 455 
Ergodic birth and death process, 399 
Ergodic define, 215 

Ergodic Markov chain, 255 

Erlang’s loss system, 563-564 
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Event, 2 
Events that occur, distribution of, 74-77 
Excess life of renewal process, 446-447, 
453-454 
Expectation, conditional, 97, 103. see also 
Conditional expectation, conditional 
probability and 
Expectations, computing by conditioning, 
106-121 
computing variances by conditioning, 
117-121 
Expected discounted return, 288-289 
Expected system lifetime, 610-616 
Expected time to maximal run of distinct 
values, 474-476 
Expected value, 36, 38-39, 92-93 
of Bernoulli random variables, 36-37, 47 
of binomial random variables, 37, 47 
of compound random variables, 108-109 
of continuous random variables, 38-40 
of discrete random variables, 36-38 
of exponential random variables, 39, 
292-293 


of functions of random variables, 40-44, 46 


of geometric random variables, 37-38, 
108-110 


of hypergeometric random variables, 54-57 


of normal random variables, 39-40 
of Poisson random variables, 38 
of random variables, 36—44 
of sum of random number of random 
variables, 108-109 
of sums of random variables, 47-48 
tables of, 66 
of time until k consecutive successes, 
113-114 
of total discounted reward, 178, 293-294 
of uniform random variables, 39 
Exponential distribution, 292-312, 373, 
686-688 
further properties of, 301-308 
Exponential distribution and Poisson process 
convolutions of exponential random 
variables, 308-312 
definition, 292-293 
exponential distribution, 292-312 


further properties of exponential distribution, 


301-308 
properties of exponential distribution, 
294-301 
Exponential queuing models, 502-527 
birth and death, 517-522 


queueing system with bulk service, 524-527 


shoeshine shop, 522-524 

single-server, 502-511 

single-server, finite capacity, 511-517 
Exponential random variables, 34, 39 

convolutions of, 308-312 

mixtures of, 606-607 

rate of, 299-301 

simulating, 673 

Von Neumann algorithm, 686-688 


F 

Failure rate 
decreasing, 603-604, 606-607 
increasing, 579, 608 

Failure rate function, 299, 311, 355 
discrete time, 311 

Feedback, queueing, 531 

First-come, first-served (FIFO) ordering, 

543-544 


G 
Gambler’s ruin problem, 230-232 
application to drug testing, 232-234 
hitting times and, 635-636 
maximum variable and, 635-636 
Gamma distribution, 604, 684 
Gamma random variables, 34 
independent, 60 
Gaussian processes, 651-654 
General renewal process, 466, 486 
Geometric Brownian motion, 636-638, 662 
Geometric distribution 
mean of, 109-110 
simulation of, 689 
Geometric random variable, 29, 37-38 
variance of, 118-119 
Gibbs sampler, 263-264, 536-537 
G/M/1 queues, 553-558 
busy/idle periods, 558 
G/M/k queues, 565-567 
Graph and components, 596 
Graphs, random, 141-149, 596-599 
Greedy algorithms, analyzing, 303-305 


H 
Hardy-Weinberg law, 217-219 
Hastings-Metropolis algorithm, 261-264 
Hazard function, defined, 608 
Hazard rate function, 299, 311 
Hazard rate method, 677-680 
discrete, 728-729 
Hit-miss method, 732 
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Hitting time theorem, 162-164 
Hyperexponential random variable, 300 
Hypergeometric, defined, 54, 56 
Hypergeometric distribution, defined, 100 
Hypoexponential random variable, 308 


I 
Idle period, 347, 540 
IFR lifetime distribution, 605 
IFR, parallel system that is not, 606 
Ignatov’s theorem, 135, 157-159 
Impulse response function, 659 
Inclusion and exclusion, method of, 591-599 
Inclusion-exclusion bounds, 593 
Inclusion-exclusion identity, 6 
Inclusion-exclusion theorem, 136-138 
Increasing failure on the average (IFRA), 608 
Increasing failure rate (IFR), 603 
Increasing failure rate on the average (IFRA), 
579 
Independent events, 10-12 
pairwise, that are not independent, 11 
Independent increments, 312 
Independent random variables, 48-49, 322 
sum of, 65 
Indicator random variable for event, 23 
Inspection paradox, 460-463 
Instantaneous transmission rates, defined, 384 
Insurance, 122-123, 138-139, 334-335, 
353-354, 451-452 
Bonus Malus system, 194-195, 229-230 
Insurance ruin problem, 478-484 
Interarrival time, sequence of, 317 
Intersection of event, 3 
Inventory example, 455-457 
Inverse transformation method, 672-673 
discrete analog, 689 
Inversions, 96 
Irreducible Markov chain, 205 
Irrelevant system component, 624 


J 


Jacobian determinant, 61 
Joint cumulative probability distribution 
function, 44 
Joint density functions, 51-52, 61 
Joint moment generating function, 96 
Joint probability 
computing, 333-334 
density function, 45 
distribution of functions of random variables, 
59-61 


mass function, 45 
of sample mean and sample variance from 
normal population, 71-74 
Jointly continuous random variables, 45 
Jointly distributed random variables, 44-61 
joint distribution functions, 44-48 


K 


Kolmogorov’s backward equation, 386 
Kolmogorov’s forward equation, 389 
k-out-of-n structure, 581 
k-out-of-n system, 583, 612-613 

with equal probability, 587 

with identical components, 605-606 
k-record index, 158 
k-record values of discrete random variables, 


157-160 
L 
L=aaW, 499 
L=,aWQ, 500 


Laplace transform, 69, 316 
Left skip free random walks, 160-166 
Limit theorems, 77-84 
Limiting probabilities, 390-397 
Linear birth rate, 375 
Linear growth model with immigration, 
375-377 
Linear program 
defined, 268 
optimization problem, 234 
List model, 140-141 
Little o notation, 314 
Little’s formula, 499 
Long run proportion of time, 215 


M 
Markov chain generated data, mean pattern 
times in, 225-229 
Markov chain in genetics, 217-219 
Markov chains, 191-290 
branching processes, 245-249 
Chapman-Kolmogorov equations, 195-204 
classification of states, 204-214 
continuous-time, 372-374, 386-389 
defined, 192 
ergodic, 255 
expected discounted return, 288-289 
hidden Markov chains, 269-275 
independent time reversible continuous-time, 
404 
irreducible, 258 
limiting probabilities, 214-230 


Index 
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Markov decision processes, 265-268 
mean time spent in transient states, 243-245 
miscellaneous applications, 230-242 
Monte Carlo methods, 260-265 
simulation, 723-726 
time reversible, 249-260 
time reversible Markov chains, 249-260 
transition probability matrix, 192 
Markov chains, continuous-time, 371-419 
birth and death processes, 374-381 
computing transition probabilities, 409-411 
limiting probabilities, 390-397 
time reversibility, 397-405 
transition probability function, 381-390 
uniformization, 406-409 
Markov chains, hidden, 269-275 
defined, 269 
forward/backward approach, 272 
predicting states, 273-275 
Markov decision processes, 265-268 
Markov processes. see Semi-Markov processes 
Markovian property, 237, 238, 239 
Markov’s inequality, 77-79 
Martingale process, 649, 663 
Match problem, 127-130 
Matching rounds problem, 111-113, 120-121 
Mean 
expected value, 42 
of geometric distribution, 109-110 
joint distribution of sample, 71-74 
pattern times in Markov chain generated 
data, 225-229 
Poisson distribution with, 63-64 
Poisson random variable with, 345 
sample, 53 
value function, 342, 425 
Mean time 
for patterns, 153-157 
spent in transient states, 243-245 
Mean value analysis of queueing networks, 535 
Memory, lack of, 298, 317 
Memoryless random variable, 294, 373 
M/G/1 queueing system, 538-553 
busy periods, 540-541 
optimization example, 546-550 
priority queues, 543-546 
random-sized batch arrivals, 541-543 
server breakdown, 550-553 
simulating, 714-715 
variations, 541-553 
work, 538-540 


N 


M/G/k queues, 567-568 

Minimal cut set, 584 

Minimal cut vector, 584 

Minimal path and minimal cut sets, 582-586 
Minimal path set, 582 

Minimal path vector common, 582 

Mixed Poisson processes, conditional or, 


351-354 


Mixture of distributions, 606 

M/M/1 queues, 517 

M/M/1 queues with balking, 517 

M/M/k queues, 517-518, 564 

Moment and expected value, 42 
Moment generating functions, 62-74, 65 


of binomial random variables, 63 

defined, 62 

determines distribution, 67 

of exponential random variables, 64, 
292-293 

of normal random variables, 64-67 

of Poisson random variables, 63-64 

of the sum of independent random variables, 
65 

tables of, 66 


Monte Carlo methods, Markov chain, 260-265 
Monte Carlo simulation approach, defined, 668 
Monte Carlo’s simulation, 260 

Moving average process, 658 

Multinomial distribution, 88 

Multivariate normal distribution, 70-71 
Mutually exclusive events, 3 


Negative binomial distribution, defined, 88 
Negative binomials, compounding distribution 


related to, 172-173 


New better than used (NBU), 471 
Noncentral chi-squared random variables, 179 
Nonhomogeneous Poisson process 


conditional distribution of event times, 
366-367 

mean value function of, 342 

simulating, 697-703, 711-712 


Normal random variables, 34-36, 39-40, 94 


as approximations of the binomial, 80-81 
simulation of, 675-677, 680-684 


Null event, 3 
Null recurrent state of a Markov chain, 281 


0 


Occupation time, 408, 418 
Odds, 641-642 
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Options, pricing stock, 638-649 
Order statistics, 58-59 

simulation of, 729 
Ornstein-Uhlenbeck process, 656-657 


P 
Pairwise independent events, 11 
Parallel structure, 580-581 
Parallel system, 579, 587, 603, 617-620 
expected life of, 614-616 
that is not IFR, 606 
Path vector, 582 
Patterns 
of discrete random variables, 467-474 
mean time for, 153-157 
Patterns, applications to, 463-478 
expected time to maximal run of distinct 
values, 474-476 
increasing runs of continuous random 
variables, 476-478 
patterns of discrete random variables, 
467-474 
Period of a state of a Markov chain, 214 
Poisson arrival queues, single-server, 347-351 
Poisson compounding distribution, 166-170 
Poisson distribution, 58 
Poisson distribution with mean, 63-64 
Poisson paradigm, 68-69 
Poisson process, 312-339 
bivariate, 359 
compound, 346-351 
conditional distribution of arrival times, 
325-336, 700-701 
counting processes, 312-313 
definition of, 313-316 
estimating software reliability, 336-339 
interarrival and waiting time distributions, 
316-319 
miscellaneous properties of Poisson 
processes, 319-325 
nonhomogeneous, 339-345 
sampling, 327, 697-700 
simulating event times, 702-703 
simulating two-dimensional, 703-705 
Poisson process, exponential distribution and, 
291-369 
convolutions of exponential random 
variables, 308-312 
definition, 292-293 
exponential distribution, 292-312 
further properties of exponential distribution, 
301-308 


properties of exponential distribution, 
294-301 
Poisson process, generalization of, 339-354 
compound Poisson process, 346-351 
conditional or mixed Poisson processes, 
351-354 
Poisson process having rate, 313 
Poisson processes 
conditional or mixed, 351-354 
miscellaneous properties of, 319-325 
Poisson queue, output process of infinite server, 
344-345 
Poisson random variables, 30-31, 100, 316, 
329 
approximating binomial random variables, 
30 
approximation to the binomial, 316 
expectation of, 38 
maximum probability, 89 
with mean, 345 
random sampling, 123-125 
simulating, 660-691 
sums of independent, 70 
Polar method, defined, 684 
Pollaczek-Khintchine formula, 539 
Polya’s urn model, 149-153, 177 
Positive recurrent state of a Markov chain, 215 
Power spectral density, 660 
Priority queues, 543-546 
Probabilistic algorithm, 241 
Probabilistically process, 317 
Probabilities 
computing, 122-139 
computing joint, 333-334 
conditional, 7-10, 97-189 
defined on events, 4-6 
inclusion-exclusion identity, 6 
limiting, 214-230, 390-397 
of matches, 127-130 
stationary, 222-225 
steady-state, 500-502 
transition, 409-411 
Probability density function, 31, 42, 45-46 
conditional, 102-103 
joint, 45 
Probability distribution function, joint 
cumulative, 44 
Probability function, transition, 381-390 
Probability mass function, 25, 135 
conditional, 98-99 
joint, 45 
Probability theory, introduction to, 1-21 
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Bayes’ formula, 12-15 
conditional probabilities, 7-10 
independent events, 10-12 
probabilities defined on events, 4-6 
sample space and events, 1-4 

Pure birth process, 371 


Q 
Queueing system, 377-378 
birth and death queueing models, 517-522 
with bulk service, 524-527 
busy period, 347, 454-455, 540-541, 558 
with finite capacity, 511-517 
multiserver exponential, 378-379 
with renewal arrivals, 449 
simulating, 710, 716-717 
single-server exponential, 502-511 
single-server exponential, finite capacity, 
511-517 
waiting times, 508 
Queueing theory, 497-578 
cost equations, 499-500 
departure process, 399-400 
exponential models, 502-527 
finite source model, 559-562 
G/M/1 system, 553-558 
G/M/k queues, 565-567 
M/G/I1 system, 538-553 
M/G/k queues, 567-568 
multiserver queues, 562-568 
network of queues, 527-537 
Arrival theorem, 534-535 
Gibbs sampler analysis, 536-537 
mean value analysis, 535 
preliminaries, 498-502 
priority queues, 543-546 
random-sized batch arrivals, 541-543 
steady-state probabilities, 500-502 
tandem queues, 528 
tandem/sequential queues, 416 
transition diagram, 523 
work in queues, 538 
Queues 
infinite server, 327-329, 454-455 
limiting probabilities, 395 
multiserver, 562-568 
output process, 344-345 
single-server Poisson arrival, 347-351 
Queues, network of, 527-537 
closed systems, 532-537 
open systems, 527-532 
Quick sort algorithm, 114-117 


R 
Random graph, 141-149, 596-599 
Random numbers, 668, 669 
Random permutations, generating, 670-671 
Random subset, 730 
Random telegraph signal process, 658 
Random variable, expectation 
continuous case, 38-40 
discrete case, 36-38 
expectation of function of random variable, 
40-44 
Random variable identity, compound, 169-170 
Random variable with mean, Poisson, 345 
Random variables, 21-94 
Bernoulli, 26-27, 36-37, 47, 54-55, 
125-126, 296, 719 
binomial, 27-29, 37, 47 
chi-squared, 72, 180 
compound, 109, 167 
continuous, 24, 31-36, 42, 476-478 
convolutions of exponential, 308-312 
covariance and variance of sums of, 50-59 
Coxian, 311-312 
defined, 21 
discrete, 24, 25-31, 157-160, 467-474 
expectation of, 36-44 
expectation of sum of random number of, 
108-109 
expectations of functions of, 40-44 
exponential, 34, 39 
gamma, 34 
geometric, 29, 37-38 
hyperexponential random, 300 
hypoexponential, 308 
independent, 48-49, 322 
independent gamma, 60 
indicator, 23 
joint density function, 51-52, 61 
joint probability distribution of functions of, 
59-61 
jointly distributed, 44-61 
left skip free random walks, 160-166 
limit theorems, 77-84 
moment generating functions, 62-74 
noncentral chi-squared, 179 
normal, 34-36, 39-40 
Poisson. see Poisson random variables 
simulating binomial, 690 
simulating normal, 675-677 
simulating Poisson, 690-691 
stochastic processes, 83-85 
sum of independent, 65 
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Random variables (continued) 
sum of two independent uniform, 57 
sums of independent binomial, 67 
sums of independent normal, 67-68 
sums of independent Poisson, 57-58, 70 
uniform, 32-33, 39 
variance of binomial, 54 
variance of compound, 119-120 
variance of geometric, 118-119 
Random variables, identity for compound, 
166-173 
binomial compounding distribution, 171 
compounding distribution related to negative 
binomials, 172-173 
Poisson compounding distribution, 169-170 
Random variables, simulating continuous, 
672-688 
beta distribution, 685-686 
chi-squared distribution, 684-685 
exponential distribution, 686-688 
gamma distribution, 684 
hazard rate method, 677-680 
inverse transformation method, 672-673 
normal distribution, 680-684 
rejection method, 673-677 
Von Neumann algorithm, 686-688 
Random walk, 237-241 
left skip free, 160-166 
Markov chains, 194 
process, 437 
symmetric, 209 
Rate of distribution, defined, 299 
Rate of renewal process, 429 
Rates, instantaneous transition, 384 
Records, 93, 367 
Recurrent states, positive, 214 
Recurring states, 205-207 
Regenerative processes, 447-457 
alternating renewal processes, 450-451 
Rejection method, 673-677 
discrete, 728 
Relevant system component, 624 
Reliability, estimating software, 336-339 
Reliability function, 609 
simulating, 709 
Reliability function, bounds on, 590-602 
method of inclusion and exclusion, 591-599 
second method for obtaining bounds, 
600-601 
Reliability theory, 579-629 
bounds on reliability function, 590-602 


expected system lifetime, 610-616 
reliability of systems of independent 
components, 586-590 
structure functions, 580-586 
system life as function of component lives, 
602-610 

systems with repair, 616-620 
Renewal arrivals, queueing system with, 449 
Renewal, cycle and, 440 
Renewal function 

computing, 463-466 

defined, 425 

estimating, 712-713 
Renewal processes 

age of, 452-453 

alternating, 450-451 

average age of, 445-446 

average excess of, 446-447 

central limit theorem for, 438-439 

defined, 421 

excess of, 453-454 

and interarrival distribution, 459 
Renewal reward processes, 439-447 
Renewal theory and its applications, 421-495 
Reversible chain, time, 403 


Ss 


Sample mean, defined, 53 
Sample space 
defined, 1 
and events, 1-4 
Sample variance, 74 
Satisfiability problem, 237-242 
Second-order stationary process, 656 
Semi-Markov processes, 457-460 
Series system, 579, 586, 603, 617, 620 
of uniformly distributed components, 611 
Shuffling, 279 
Signed rank test, 188 
Simplex algorithm, 234, 268 
Simulation, 667-734 
alias method, 691-695 
coupling from past, 723-725 
determining number of runs, 722-723 
Markov chains, another approach, 725-726 
simulating continuous random variables, 
672-688, 680-688 
simulating from discrete distributions, 
688-695 
from stationary distribution of Markov 
chain, 723-726 
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stochastic processes, 696-705 
variance reduction techniques, 706-722 
Skip free random walks, 160-166 
Snyder’s ratio of genetics, 284 
Software reliability, estimating, 336-339 
Sorting, 729 
Spanning trees, 597 
Standard Brownian motion, 633 
Standard normal distribution, 36 
Standard normal distribution function, 81 
Stationary and weakly stationary processes, 
654-659 
Stationary increments, 313 
Stationary probabilities, 222-225 
Stationary processes 
harmonic analysis of weakly, 659-661 
second-order, 656 
stationary and weakly, 654-659 
weakly, 656 
Stationary processes, Brownian motion and, 
631-666 
Stationary transition probabilities, 372 
Stimulation, 667-734 
Stirling’s approximation, 146, 213-214, 236 
Stochastic processes, 84-86, 696-705 
Brownian motion, 632 
conditional distribution of arrival times, 
700-701 
Gaussian processes, 651 
index set of, 84 
simulating nonhomogenous Poisson process, 
697-703 
simulating two-dimensional Poisson process, 
703-705 
state of, 84 
state space of, 84 
Stopping time, 486, 678-679 
Stratified sampling, 732 
Strong law for renewal processes, 427-428 
Strong law of large numbers, 79-80 
Structure functions, 580-586 
Suspended animation, series model with, 
620-622 
Symmetric random walk, 209 
relation to Brownian motion, 631-632 


T 

Taylor series expansion, 83 
T-distribution, 180 

Thinning algorithm in simulation, 698 
Throughput rate, 533 


Tilted density, 718 

Time reversibility, 397-405 

Time reversible chain, 403 

Time reversible continuous-time Markov 
chains, independent, 404 

Time reversible equations, 254 

Time reversible Markov chains, 249-260 

Times, conditional distribution of arrival, 
325-336, 700-701 

Transient states, 205-206 

mean time spent in, 243-245 

Transition probabilities, computing, 409-411 

Transition probability function, 381-390 

Transition probability matrix, 192 

Transition rates, instantaneous, 384 

Tree process, 288 

Truncated chain, 403 

Two state continuous time Markov chain, 
386-389 

Two-dimensional Poisson process, simulating, 
703-705 


U 

Uniform priors, 149-153 

Uniform random variables, 32-33, 39, 57 
Uniformization, 406-409 

Union of event, 3 

Unit normal distribution, 36 


V 


Variance reduction techniques, 706-722 
control variates, 715-717 
importance sampling, 717-722 
reduction by conditioning, 710-715 
using antithetic variables, 707-710 
variance reduction by conditioning, 710-715 
Variances 
of binomial random variables, 54 
of compound random variables, 119-120 
computing, 117-121 
of exponential random variables, 292-293 
of geometric random variables, 118-119 
of hypergeometric random variables, 54-57 
joint distribution of sample, 71-74 
in matching rounds problem, 120-121 
of normal random variables, 43-44 
of the number of renewals, 465 
of Poisson random variables, 64 
sample, 74 
of sums of random variables, 50-59 
tables of, 66 
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Viterbi algorithm, defined, 275 
Von Neumann algorithm, 686-688 
Von Neumann rejection method, 674 


Ww 

Waiting time, 317 

Waiting time distributions, interarrival and, 

316-319 

Wald’s equation, 486-489, 558, 560, 679-680 
defined, 486 

Weak dependence, 69 

Weak law of large numbers, 94 


Weakly stationary process, 656 

Weakly stationary processes 
harmonic analysis of, 659-661 
stationary and, 654-659 

Weibull distribution, 603 

White noise, 649-651 

Wiener, N., 632, 633 

Wiener process, 632 

Work in queue, 538 


Y 
Yule process, 383, 413 


